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In this Issue 

When engineers talk about millimeter waves they're referring to elec- 
tromagnetic energy in the frequency range of 26 to 300 gigahertz, more or 
less. Although this region of the electromagnetic spectrum is beginning to 
be used for high-speed communications, high-resolution radar, guidance 
systems, and radio astronomy, circuit elements such as transistors are avail- 
able only for the low end of this frequency range, and integrated circuits are 
hard to come by- In the last few years, Hewlett-Packard Laboratories and 
the HP Microwave Technology Division have achieved notable success in 
developing small-scale integrated and hybrid circuits that operate at 100 
GHz and beyond. Using gallium arsenide and related compounds and a process known as 
molecular beam epitaxy (MBE), HP engineers have fabricated Schottky and modified barrier 
diodes, along with ICs and hybrids based on these diodes. Mainly useful for nonlinear and 
frequency translation applications such as detectors, samplers, multipliers, and mixers, they are 
now available in several new millimeter- wave products from HP, On page 4 T Doug Collins of HP 
Laboratories tells us how MBE works. Modified barrier diodes and their uses in products are 
discussed in the article on page 10. Fabrication of ICs and hybrids is the subject of the article on 
page 14, and the article on page 22 describes a series of mixers that use the new GaAs diodes 
to extend the range of microwave spectrum analyzers to 110 GHz. The cover photograph shows 
several of these mixers with their horn antennas of various sizes for operation in different frequency 
bands. 

The remainder of this issue is devoted to several papers on unusual software tools for the 
support of HP computer systems in the field. These are forward-looking efforts, aiming to predict 
failures before they happen and to use artificial intelligence (Al) technology in expert systems for 
troubleshooting and configuring computer systems. Predictive Support (page 30) is a software 
package that lives on a customer's system, counts soft errors, and notifies HP support personnel 
when it spots an ominous trend. Reports indicate that it works and customers like it. AIDA (page 
34) is an expert system that helps HP support personnel analyze HP 3000 Computer memory 
dumps. A dump is a last-resort method of finding a computer problem. It may contain many 
megabytes of information and is hugely complex to analyze. AID As major contribution is its 
formatting capability, which helps the human expert organize all that data, That it can also find 
corrupt data automatically is considered a bonus by many HP troubfeshooters. Schooner and 
IPT (pages 42 and 48) are expert systems for troubleshooting datacom links and disc drives, 
respectively. They've proved useful for trouble shooters learning the business and for solving the 
easier problems, leaving experienced personnel more time to tackle the tougher problems, Mycon 
(page 54) is a prototype expert configurator for computer systems. Worthy of note is its concept 
of multilevel constraints, which supplement and refine traditional approaches to the design of 
expert configurators. 

R, P. Dolan 



What's Ahead 

The December issue will be another in our series covering HP Precision Architecture topics. 
Subjects included are the HP-UX operating system for technical and real-time HP Precision 
Architecture computers such as the HP 9000 Model 840, and ALL8ASE, the data base manage- 
ment system for both technical and commercial HP Precision Architecture computers. December 
is also our annual index issue. 
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Molecular-Scale Engineering of 
Compound Semiconductor Materials 

The ever increasing sophistication of semiconductor 
electronic devices and integrated circuits continues to 
place increasing demands on the precision with which the 
underlying semiconductor materials are produced. The 
development of molecular beam epitaxy allows the highly 
re peat able growth of compound semiconductor epitaxial 
films (such as GaAs and Al x Ga 1 - x As) with atomicalty abrupt 
changes in alloy composition and doping and with excellent 
uniformity. 

by Douglas M. Collins 



SEMICONDUCTING MATERIALS provide the foun- 
dation upon which today's high -techno logy elec- 
tronics industry is built, While silicon is the most 
widely used semiconducting material, and that for which 
the most advanced technology has been developed, its 
physical properties are not suitable for many of today's 
device needs. Thus, the use of other semiconducting ma- 
terials whose electronic band structure leads to specific 
properties that make them more suitable than silicon for 
specific applications is becoming more and more common. 
One such class of semiconductors is that known as the 
group III-V compound semiconductors. This name is de- 
rived from the fact that these materials are compounds of 
elements found in columns III and V of the periodic table. 
The most common example is gallium arsenide (GaAs), 



which is widely used in microwave and optoelectronic 
devices. 

To produce high-quality compound semiconductor ma- 
terials for high-performance electronic or optoelectronic 
devices, it is necessary to use epitaxial growth techniques. 
This is especially true for those devices that require 
heterojunctions (for a discussion of heterojunctions, see 
box on page 6). In general, an epitaxial growth process is 
any process in which chemical elements or compounds 
are deposited onto a single-crystal substrate under condi- 
tions such that the deposited materials become precisely 
arranged upon the substrate, yielding a single-crystal depos- 
ited, or epitaxial layer. In fact, the word epitaxy is derived 
from the Greek words &pi (meaning "on") and tnxis (mean- 
ing arrangement). 
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Fig. 1. Schematic representation 
of the molecular beam epitaxy 
(MBE) technique. 
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To push devices to higher and higher levels of perfor- 
mance, ii is necessary to control the epitaxial growth pro- 
cess precisely. The molecular beam epitaxy technique is 
one that provides for the precise growth of complex IH-V 
compound semiconductor structures which are particu- 
larly well-suited for high-speed (e.g.. millimeter- wave) de- 
vice and integrated circuit applications. 

The MBE Technique 

Molecular beam epitaxy (MBE) can best be described as 
a highly controlled ultra high- vacuum evaporation' deposi- 
tion process. The apparatus used for the MBE growth of 
GaAs and AJ^Ga^.^As is shown schematically in Fig 
(For a discussion of alloy semiconductors such as 
Al x Ga 1 _ x As, see box on page 8.) The principal components 
in this system are the resistance-heated source furnaces, 
the source furnace shutters, and the heated substrate sta- 
tion, Three of the source furnaces contain the elements Ga P 
At, and As which, when reacted on the heated single-crystal 
substrate, produce the epitaxial semiconductor layer. The 
other two sources contain silicon and beryllium, the two 
most commonly used dopants in the MBE growth of GaAs 
and Al^Ga^xAs, Silicon substitutes for the gallium or 
aluminum atoms on the group-Ill sublattice resulting in 
one extra electron; hence silicon is an n-type 1 or donor, 
impurity in these materials. Beryllium also substitutes for 
the aluminum or gallium atoms, but with one less electron; 
hence beryllium is a p-type, or acceptor, impurity in these 
materials, 

The rate of evaporation of the source materials, and thus 




Fig. 2. High-resofution transmission electron micrograph of 
atomicaiiy abrupt GaAs/ At As helerojunction formed by MBE 

the flux of atoms or molecules arriving at the substrate 
surface, is determined by the tempera lure of each of the 
sources- Each source temperature is individually con- 
trolled to within ±u\2°C. which results in less than ±1% 
variation m the source flux. 1 The temperatures (and thus 
the fluxes) of gallium and aluminum [both of which evapo- 




Fig. 3. HP's MBE systems (right) are controlled by HP 1000 Computers (left). 
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Compound Semiconductor Alloys and Heterojunctions 



The properties of semiconducting materials that are of practical 
interest depend on the material's crystal structure, the interatomsc 
spacing in the crystal, the electronic band structure, and the 
thermodynamics of defect formation, in particular the incorpora- 
tion of impurity or dopant atoms that alter the electronic properties 
in a desired fashion. These factors combine to determine the 
bandgap energy E g of the semiconductor (and whether the 
bandgap is "direct" or "indirect"), the transport properties of 
electrons and holes (in particular the mobilities of these charge 
carriers), and the practical matter of being able to control the 
electron and hole concentrations over the ranges necessary for 
use in devices. 

Silicon crystallizes into the so-called "diamond" structure (Fig. 
1a), which can be described as two interleaved face-centered 
cubic lattices (an FCC lattice has an atom at every corner of a 
cube in addition to an atom at the center of every face of the 
cube). One FCC lattice has its origin at 0, 0, and the other FCC 
lattice has its origin at a/4, a/4, a/4 (only a portion of the second 
lattice is shown in Fig. 1 ). This results in a teiranedral coordination 
between each silicon atom and its four nearest neighbors. 

Most of the lll-V compound semiconductors, including GaAs, 
crystallize into the so-called "zincblende" structure (Fig. 1b). 
This structure is remarkably similar to that of silicon. In fact, it is 
the diamond structure, except that one of the two interleaved 
FCC sub-lattices is composed of group-Ill atoms (e.g., gallium) 
while the other is composed of group-V atoms (e.g., arsenic). 
Thus each group-Ill atom is tetrahedrally bonded to four nearest- 
neighbor group-V atoms H and each group-V atom is tetrahedrally 
bonded to four nearest-neighbor group-ill atoms. 

The additional complexity of the zincblende structure offers 
a degree of freedom not available in elemental semiconduc- 
tors such as silicon and germanium. This freedom is the ability 
to mix different group-Ill elements on the group-Ill sublattice as 
well as different group-V atoms on the group-V sublattice. Thus 
we can form ternary (e.g.. A^Ga^^As) and quaternary (e.g., 
1n x Ga- xASyP^) alloy semiconductors, where ^ x *= 1 and 
^ y m 1 The subscripts x and y in these chemical formulas 
indicate the fraction (commonly called the mole fraction) of each 
sublattice that is occupied by each element Since, in general, 
different semiconductors have different bandgap energies and 
different electron and hole transport properties, the ability to form 
alloy semiconductors provides the materials scientist with the 
freedom to engineer the material to the needs of the device 
physicist. Thus it is possible, for example, to adjust the bandgap 
of a semiconductor to give the desired wavelength of emission 
for a light-emitting diode. 



In addition to providing the flexibility of engineering semicon- 
ductor material properties by forming these alloys, compound 
semiconductor technology provides an added capability with 
far-reaching consequences — the ability to join two different 
semiconductor materials intimately together to form a semicon- 
ductor heterojunction. This is conceptually very similar to joining 
together p-type and n-type silicon to form a semiconductor pn 
homojunction. However, it leads to many more potential applica- 
tions of lll-V compound semiconductors. For example, if a thin 
layer (approximately 0.2 micrometer) of one semiconductor is 
sandwiched between two layers of a second semiconductor that 
has a larger bandgap and a smaller index of refraction, the center 
layer will serve to confine both charge and light, thus providing 
two of the necessary conditions for the formation of a population 
inversion in this layer, which leads to semiconductor laser ac- 
tion. 1 Such structures are known as double heterojunctions. 
Another technologically important heterojunction is the modula- 
tion-doped heterojunction (see box. page 8), which is important 
to the very high-speed operation of the modulation-doped field 
effect transistor (MODFET, also known as the high-electron-mo- 
bility transistor, or HEMT) 2 

To form semiconductor heterojunctions with a high degree of 
structural perfection, the two semiconductors must have identical 
(or compatible) crystal structures and nearly equat interatomic 
spacings, or lattice constants (dimension a in Fig. 1 ). This require- 
ment can be met by using compound semiconductor alloys that 
have the same crystal structure while simultaneously choosing 
alloy compositions (x and/or y in the above chemical formulas) 
that result in the same lattice constants. This is a condition which 
is referred to as lattice matching, I n the GaAs/AI^Ga! x As material 
system discussed in the accompanying article, the lattice match 
is nearly perfect for all values of x. 

References 

1. H.C. Casey and M.B. Partial! . Heterosiracture Lasers Part. A. Fundamental Princi- 
ples, and Pari B, Materials anri Operating Characteristics, Academic Press. New York, 
1978 

2 H Marfcac and P.M. Solomon. "The HEMT. A Supeiiast Transistor." IEEE Spectrum, 
Vol 21 . Ha 2. February 19M. pp. 2B-35 





(a) 



(b) 



Fig. 1, Semiconductor crystal 
structures (a = lattice constant), 
(a) Diamond structure for silicon, 
carbon, germanium, etc. (b) Zinc- 
blende structure for GaAs, GaP, 
tnSb, etc, Bonds between nearest 
neighboring atoms are shown in 
color. 
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Fig. 4, Free electron concentration at room temperature 
(300K) for Si -doped MBE GaAs as a function of silicon source 
temperature 

rate as monatomic beams) are chosen to give the desired 
growth rale and alloy composition. The temperature of the 
arsenic source (arsenic evaporates as the As 4 molecule) is 
chosen such that the As 4 flux is slightly higher than that 
required for growth of stoichiometric GaAs or Al y Ga t _ x As 
[Le* t where the number of gallium and aluminum atoms 
incorporated into the film is equal tn the number of arsenic 
atoms incorporated into the film). The small excess of ar- 



senic is used to ensure nearly perfect stoichiometry since, 
at the substrate temperatures used (typically 500 to 700°C). 
the arsenic will bond only to gallium or aluminum and not 
to itself. The excess arsenic evaporates from the film sur- 
face. 2 Similarly* the temperatures of the silicon and beryl- 
lium sources are chosen such that the fluxes of thes* 
meats lead to the desired concentrations of n-type and 
p-type doping, respectively. 

The temperatures of the gallium and aluminum sources 
are typically chosen to result in a growih rate of 1.0 /im/ 
hour. This corresponds to only one molecular \ aver of GaAs 
or Al x Ga 1 _ x As per second. Since each source furnace is 
equipped with its own individual shutter that can he 
opened or closed in a small fraction of a second* it Is a 
simple matter to control the composition or doping in an 
epitaxial film on an atomic scale. An example of an atom- 
ically abrupt GaAs/AlAs heterojunction (grown by closing 
the gallium shutter and opening the aluminum shutter 
simultaneously] is shown in the high-resolution transmis- 
sion electron micrograph of Fig, 2. 

The MBE systems in operation at Hewlett-Packard Com- 
pany (e,g f , see Fig. 3) are interfaced with HP 1000 Comput- 
ers that control the temperature setpoints for the source 
furnaces and the substrate, the source shutter operation, 
and the growth times for all layers in multilayer films. In 
addition to having the capability for abruptly changing the 
composition or doping of MBE films, the computer is also 
used to contour alloy compositions and doping profiles 
programmably to achieve specific materials structures for 
use in particularly high-performance devices, 3 

For the high degree of control offered by MBE to be of 
practical use, it is necessary that the undesirable back- 
ground impurity levels in MBE films be as low as possible 
and that the films be highly uniform. These conditions are 
easily met in the MBE growth of GaAs, The ultrahigh vac- 
uum system used in MBE is pumped by ion and titanium 
sublimation pumps and includes extensive liquid-nitrogen 
cxyoshrouding to pump condensable gases such as water 
vapor. Background partial pressures of undesirable gases 
are typically below 10 1 1 ton\ This allows undesirable im- 
purity levels as low as ten parts per billion to be achieved. 
Gontinuous rotation of the substrate during growth leads 
to demonstrated uniformities in thickness {or growth rate] 
and doping concentration of better than ±2% across a two- 
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The Modulation-Doped Heterojunction 



The accompanying article demonstrates that MBE is capable 
of producing GaAs and Al x Ga,_ K As films with a high degree of 
control over doping concentration and alloy composition and 
wrth extremely abrupt heterojunction interfaces. All of these ca- 
pabilities are important for the realization of the high-speed de- 
vice known as the modulation -doped FET The layered material 
structure upon which this device is based is commonly referred 
to as a selectively doped, or modulation-doped heterojunction 
(see Fig, 1). 1 It has been most commonly produced in the GaAs/ 
A^Ga^As material system, although other semiconductor 
heterojunctions have also been used 2 The epitaxial layer struc- 
ture is grown on a high-resistivity (TO 7 ohm-cm) GaAs substrate 
The first layer is undoped GaAs approximately one micrometer 
thick, which must be of very high purity (typically less than lu 1fl 
ionized impurities per cubic centimeter). The second layer is 
undoped Ai K Gai *As (x = 0.3) which, depending on the desired 
properties of the structure, will typically be between 2 and 15 
nanometers thick. This corresponds to only 8 to 60 atomic layers, 
The atomic scale abruptness of the heterojunction between these 
two layers and ihe precise control of the undoped AJ 3 Gao 7 As 
"spacer" layer thickness are critical to the proper performance 
of this device, The third layer is silicon-doped (N Si = 2x10 1B 
cm " 3 ) A^Ga, _ H As with x typically the same as that in the undoped 
spacer layer which, again depending on the desired properties 
of the structure, will be between 10 and 50 nanometers thick 
The fourth, and last, layer is n-type GaAs (n = 2x1G 1B cm 3 ) 
approximately 20 nanometers thick. This layer facilitates making 
low-resistance ohmic contacts to the structure, 

The special properties of this structure that make it of interest 
for high-speed device applications are best understood by refer- 
ring to the energy band diagram shown in Fig. 2 Here we see 
that a conduction band discontinuity is formed at the abrupt 
GaAs/AI K Ga,_ K As interface This is accompanied by depletion 
of electrons from the AJ^Ga-, K As and accumulation of these elec- 
trons in the GaAs. The potential "notch" in which these electrons 
reside confines the electrons to a region so thin that they behave 
as a two-dimensional system. This system is commonly referred 
to as a two-dimensional electron gas This electron gas has many 
similarities to the two-dimensional electron gas in the gate chan- 
nel of a silicon MOSFET However, there are two very important 
differences. First, the GaAs/Al x Ga 1 x As heterojunction is very 
nearly structurally perfect and thus electrons traveling parallel 
to this heterojunction interface are unlikely to scatter from struc- 
tural defects. Second, since the GaAs is of very high purity and 
the ionized silicon donors in the A^Ga^^As are set back from 
the heterojunction interface by the spacer layer thickness, scat- 
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Fig. 2. Energy band diagram for structure of Fig. ? . The two- 
dimensional electron gas formed in the structure is ideal for 
the conducting channel of a FET. 
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tering from tHe silicon donor atoms is minimized Tnese two fac- 
tors result in excellent transport properties for electrons 
modulaftorvdoped fteterojunction structure 

The effect on electron transport properties is shown m Rg 3 
where the significantly enhanced mobilities found in these 
he tero junction structures (top curve) are compared with those 
observed m unrformly doped GaAs and AI^Ga^/As with com- 
parable sheet electron concentrations (In this case r^ = 6* 1G V1 
cm" 2 , which is comparable to the sheet carrier concentration in 
the channel of an FET). This data shows that the mobility in the 
modulation-doped structure is a much stronger function of tern- 
perature than the mobilities m the uniformfy doped films This 
occurs because the mobility at higher temperatures m the mod- 
ulation-doped structure is still dominated by phonon scattering 



ittice vibrations) However, even at room temperature, the 
moduJattorvdoped structure exhibits a mobility that is a factor of 
1 .5 to 2 higher than in the uniformly doped GaAs material fn 
short-gate-Jength MODFETs this results in an increase in the 
electron peak velocity by a factor of 1 5 to 2 and thus a compar- 
able increase in the speed of operation of the device 3 

References 
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tnch diameter substrate. 

MBE Grown GaAs and Al x Ga 1 _ x As 

The realization of the precise control of epitaxial film 
parameters promised above can best be demonstrated by 
presenting results of doping and alloy composition control 
for MBE films. Fig. 4 shows the free electron concentration 
(n, as measured by the Hall effect 4 ) at room temperature 
for Si-doped MBE GaAs as a function of the reciprocal of 
the silicon source temperature. Several important points 
may be made from this data, First, it is clear that we have 
the capability of varying the n-type doping of MBE GaAs 
over a wide range [from less than 10 15 cm" 3 to 5*io 18 
cm" :, f which corresponds to approximately 0,02 to 100 
parts per million) with a high degree of control. In fact, 
the slope of the precisely exponential variation in n with 
l/T S j corresponds closely to the activation energy for the 
evaporation of silicon, indicating that the silicon incorpo- 
ration rate is directly proportional to the silicon flux** The 
two different symbols in Fig. 4 correspond to two different 



series of MBE growth runs, the second having been carried 
out after refilling the gallium and arsenic source furnaces. 
Thus excellent control and repeatability are readily 
achieved in the MBE growth of GaAs. 

To achieve superior performance of electronic devices, 
it Is necessary not only to control carrier concentration 
precisely, but also to assure good carrier transport proper- 
ties. A common measure of the quality of n-type GaAs is 
the electron mobility. Mobility, as the term suggests, is a 
measure of the ease with which the charge carriers move 
through the crystal lattice. In particular, high mobilities in 
lightly doped n-type GaAs at low temperatures (e>g, , liquid- 
nitrogen temperature of 77K) are indicative of low concen- 
trations of background impurities since, under these condi- 
tions, the mobility is dominated by scattering of electrons 
by impurities in the crystal. Mobilities for Si-doped, MBE- 
grown GaAs are shown in Fig, 5 as a function of the free 
electron concentration. This data was obtained at a temper- 
ature of 77K. where impurity scattering dominates. The 
solid line shown for Low doping levels [below Sxiu 1 ' 
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cm -3 ) represents mobilities for extremely high-purity GaAs 
grown by the AsCl 3 vapor phase epitaxy (VPE) technique. B 
It is clear that the MBE GaAs compares very favorably with 
the high-purity VPE material, 

The control of the MBE growth of A^Ga! x As is dem- 
onstrated in Fig. 6 where growth rate is shown as a function 
of gallium (Fig. 6a) and aluminum (Fig. 6b) source temper- 
atures , The symbols represent experimental data for differ- 
ent growths, The solid curves represent the relative temper- 
ature dependencies of Ihe vapor pressures of gallium and 
aluminum. Thus, by appropriate choices for the gallium 
and aluminum source temperatures, both growth rate and 
alloy composition of GaAs and A! x Ga lTK As can be readily 
controlled and reproduced using MBE, 
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Extending Millimeter-Wave Diode 
Operation to 110 GHz 

by Eric R. Ehlers, Sigurd W. Johnsen, and Douglas A. Gray 



N THE LATE 1970s, silicon Schottky diodes and point 
contacts dominated diode applications in the micro- 
wave and low millimeter-wave range of frequencies. By 
using carefully designed microwave structures, perfor- 
mance well into the mi Hi meter- wave range could be 
achieved. This performance, however, did not come easily. 
These structures were difficult to reproduce and as a result 
were very expensive. The small diode structures were also 
sensitive to damage by electrostatic discharge (ESD). Al- 
though Schottky diode pairs and quads were readily avail- 
able, diodes integrated with other circuit elements could 
not be obtained as off-the-shelf components. 

When HP launched its effort to develop a family oi diodes 
particularly suited to millimeter-wave applications, we 
wanted a device that could be readily integrated into a 
microwave structure, was rugged and capable of handling 
high drive levels for multiplier applications, and had at 
least the sensitivity of presently available diodes. 

Diode Technology 

To meet the project goals of designing low~capacitance t 
high-power-handling diodes suitable for small-scale inte- 
grated circuits, the selection of material technology was of 
paramount importance. Although silicon technology was 
very mature and some silicon devices did exist for applica- 
tions up to 40 GHz, the high parasitic capacitance of silicon 



substrates made it difficult to integrate other circuit ele- 
ments and beam leads without sacrificing high-frequency 
performance or power handling. It was clear that the design 
and fabrication of a useful device up to 110 GHz (W-band] 
required a different material technology. 

The DT-V compound semiconductor gallium arsenide 
(GaAs) offers several advantages over silicon. The intrinsic 
doping in GaAs is two orders of magnitude lower than 
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Fig. 2. Cross section of sintered Schottky barrier diode 

silicon, This results in a much higher substrate resistivity 
and lower parasitic: capacitance effects. The lower parasitic 
capacitance of GaAs substrates is critical. It allows the de- 
sign of larger devices with increased power handling capa- 
bilities without degrading frequency performance. 

Another advantage of GaAs is its higher electron mobility 
(see Fig, 1). This higher mobility translates up to a 5 x 
higher electron velocity in GaAs compared to silicon. This 
leads to much lower carrier transit times and makes it 
possible to design devices exhibiting cutoff frequencies in 
excess of 1000 GHz. 

For these reasons GaAs was the material of choice for 
the millimeter-wave diode project. There was a major dis- 
advantage, however, because a primary goal of the project 
was to design low-barrier and medium-barrier diodes 
which are commonly used in mixer and detector applica- 
tions. Although HP has been very active in GaAs research 
and development (HP instrumentation has been shipped 



using internal GaAs FET and diode technology since the 
early 1970s), a reliable method of repeatably fabricating a 
modified barrier diode on GaAs presented a major technical 
challenge, 

The difficulties in fabricating modified barrier diodes on 
GaAs were overcome by a two-phase approach. Slandard 
Schottky barrier diodes are fabricated using liquid-phase 
epitaxy to grow the double n*on-n -f layers required for 
millimeter-wave IC fabrication. Low-barrier Schottky 
diodes were originally fabricated using the same epitaxial 
layers, but with a proprietary Schottky barrier metallization 
_ The lo\v-barrier diode was formed by a sintering 
process after the Schottky metal was patterned. The sinter- 
ing formed a thin, heavily doped n + layer under the anode 
which lowered the barrier height to the desired value (see 
Fig. 3). This process led to the introduction of a 26-GHz-to- 
40-GHz zero-biased detector fully integrated on one GaAs 
chip (see box on page 13]. The sintering process was, 
however, very difficult to control and the devices %vere 
extremely sensitive to ESD, which resulted in relatively 
low yields during microcircuit assembly. However, even 
with the low yields, this device represented a significant 
advance in the state of the art of millimeter-wave device 
technology. 

The second generation of modified barrier diodes was 
made possible by the capabilities provided by molecular 
beam epitaxy [M BE, see article on page 4). MBE technology 
provides the means for very controlled growth of the epi- 
taxial layers required for the fabrication of planar doped 
barrier diodes [also called modified barrier diodes by HP). 
These diodes are formed hy growing a multilayer GaAs 
epitaxial structure sandwiched between two low-resistance 
contacts (see Fig, 4). The design of the epitaxial structure 
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Fig. 4, (a) Cross section of planar doped barrier diode fab- 
ricated by molecular beam epitaxy (MBE). (b) i-V characteris- 
tic. 

determines the barrier height of the diode. Therefore, ex- 
tremely tight control of epitaxial layer doping and thickness 
is required. Epitaxial layers as thin as 4 nm must he grown 
with abrupt transitions between layers. This critical step 
in the fabrication of modified barrier diodes has been 
moved out of the process lab and placed on the shoulders 
of the computer-controlled MBE syslem. The slate of MBE 



technology at HP has demonstrated its consistent perfor- 
mance to these demanding specifications, enabling the de- 
sign and fabrication of device structures for both low-bar- 
rier and medium-barrier diode applications. 

With the technology in place for manufacturing GaAs 
diodes with varying barrier heights, the next step was to 
provide the capability for fully integrated solutions of some 
standard inillimeter-wave circuit functions. Most of the 
mi Hi meter- wave circuits used in i i is I ru mentation are 
samplers, mixers, multipliers, and detectors. At frequen- 
cies above 26 GHz, the repeatability of hybrid assembly 
techniques is critical, A slight variation in a bond-wire 
length can be a significant fraction of a wavelength at these 
frequencies. The integration of a mi Hi meter- wave diode, 
resistors, and capacitors on a single GaAs chip with beam 
leads for thin-film circuit interconnects offers superior per- 
formance in phase matching for samplers and greatly re- 
duces the complexity and assembly costs of millimeter- 
wave hybrid circuits, The integration process developed 
provides I he circuit designer with the flexibility for inte- 
grating silicon nitride capacitors with values up to 20 pF 
and tantalum nitride resistors with a sheet resistivitv of 50 

no 

The technology is directed toward allowing design en- 
gineers working on new millimeter-wave test instruments 
to design their own custom integrated circuits, There are 
some roadblocks, however. A designer must work very 
closely with the 1C facility in make sure process guidelines 
or limits are not violated by the design. This can be the 
most time-consuming and frustrating part of the design 
stage, There is also a fear of the high development costs 
I hat are normally associated with a custom IC design. This 
lad or is especially a concern at millimeter- w T ave frequen- 
cies where current computer-aided-design techniques are 
not sufficient to ensure a successful circuit design on the 
first try. 

To lessen some of these difficulties and to encourage the 
use of this technology at HP, it was decided to offer it in 
a prototype mode. In this approach, circuit designers can 




Fig. 5, fa, i Composite mask for prototyping modified barrier diode /C designs, (b) Production 
mask used for volume production of successful designs. 
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26.5-to-40-GHz Waveguide 
Detector 



in 1983 Hewlett-Packard introduced the HP 1 T664D 26.5-to 

40-GHz waveguide detector for the HP 8756A Scalar Network 

~er Th& HP 1 1 664D uses a Scfrottky diode with a barner 

height modsfied by a smtehng process in an integrated defector 

This CfrcLH! has proven reliable in tnree years ct 
the field However, manufacturing costs of the product are high 
because of difficulties In controlling the device fabrication pro- 
cess and losses in assembly caused by static electricity dis- 
charges "Hie new diodes, described in the accompanying article, 
are significantly easier to manufacture, and losses during assem- 
bly are reduced by a more rugged structure, 
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Fig, 1. Typicai return loss (a) and flatness (b) of HP 1 7B64D 
Detector 

Experience 

To date, manufacturing losses during assembly with the sin- 
lered diodes has occasionally run as high as 50% Under the 
same conditions, there have been no failures of the new modified 
barrier diodes This Is attributed to the tenfold reduction in elec- 
trostatic susceptibility of these diodes. Slightly Sower capacitance 
has also led to an improvement in the frequency response of the 
HP 11664D. The lower capacitance allowed a small assembly 
change that resulted Iri a 1 0-dB improvement in flatness as a 
function of frequency These diodes also have a more predictable 
voltage response as a function of input power This allows im- 
proved dynamic accuracy for the HP 1 1 664D , Fig t shows typical 
return loss and flatness of the HP 1 1 664 D Detector 

Reference 

i C C Chang, et ai. A Zeroeift$ GaAs MlWlmeter Wave integraled Detector 
t£E£ Microw&VB Thsory and Techniqtjes Symposium Otgast 1982 

Herb Upham 

Development Engineer 

Microwave Technology Division 



design custom devices using the components supported in 
the integrated diode process and optimize the design for 
their specific application. The design is first placed on a 
composite mask set (Fig, 5a) with many other custom inte- 
grated diodes. This composite mask set is fabricated to 
sup: -rious designers with a small number of sam- 

ples of their design for evaluation, If another design itera- 
tion is required* the redesigned device is placed on another 
composite mask for another cycle of protot3 r pe samples. 
This spreads out the development cast of the custom de- 
vices to all the designers using a composite mask. It makes 
it more affordable to do several design iterations or -.- 
try a high-risk design idea. Once a design is optimized and 
assigned to an instrument project, it is stepped onto a ded- 
icated mask set (Fig. 5b) for higher-volume production. 

The development of this integrated diode technology was 
very closely linked with several instrument development 
projects. The HP 8510A Network Analyzer and the HP 
5 3 50 A 51 A/52 A Frequency Counters' rely on GaAs inte- 
grated samplers for down-conversion. The design of the 
HP 11970771 Harmonic Mixers and the HP 83554 A/5 A^6A 
Millimeter-Wave Sources, which both cover the waveguide 
bands up to 60 GHz, also relies heavily on these Schottky 
barrier diode ICs. A major milestone, however, was the 
development of the planar doped barrier diode which en- 
abled HP to design the HP 11970V/W Harmonic Mixers 
that allow spectrum measurements up to 110 GHz. 

Planar Doped Barrier Diode Applications 

Special Features of these modified barrier diodes make 
them ideally suited lor certain key millimeter-wave prod- 
ucts, Before discussing these applications, however, it 
should be pointed out that in terms of their terminal voltage 
and current characteristics at low frequencies, these diodes 
are very similar to conventional pn junction diodes and 
metal -semiconductor Schottky diodes. In forward bias, 
they closely follow the ideal diode equation as Illustrated 
it] Fig. 3: 

I = I„[expl(V-iKJ/nV,)-l] 

where I M is the saturation current, n is the ideality factor 
(typically 1.2 for these modified barrier diodes), R b is the 
series resistance, and V r is the thermal voltage (H.026V at 
300K). In reverse bias, the low-barrier versions of these 
modified barrier diodes deviate from this ideal equation 
and exhibit a 4 \sofl" breakdown characteristic caused by 
I be variation of barrier height with bias (see Fig, 3)- 

The most important feature of the planar doped barrier 
diodes is that their barrier heights are adjustable by control- 
ling the thickness and doping of the epitaxial layers. Chang- 
ing t h e ba rri er h eigh t i s equ i va lent to varying the saturation 
current and hence I he turn-on voltage ol the diode. Control- 
ling the turn-on voltage is highly desirable for certain ap- 
pLicalions. One of these applications is Lhe unbiased har- 
monic mixer (see article on page 22), For I his application 
it is critical that the conduction angle (the number of de- 
grees during each local oscillator (LO) cycle that the diode 
is on] be controlled precisely, and for a given LO power 
the conduction angle is controlled by the barrier height. 

Another application that requires a low barrier height i.s 
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the broadband unbiased detector. Diode detectors are 
widely used at microwave and millimeter- wave frequen- 
cies as power meter sensors, scalar network analyzer detec- 
tors, and parts of leveling circuits for sources. To work 
well as an unbiased detector, a diode must essentially be 
1 urned on at zero volts. This implies a high saturation cur- 
rent and a low barrier height. A critical specification for 
these diodes is the video resistance R v , which can be de- 
fined as the incremental resistance of theviiode at a bias 
of zero volts. For most detector applications R v must be 
between 50011 and 5 kll. This requires precise control of 
barrier height and is an ideal application for planar doped 
barrier diodes. 

An important parameter lor microwave and millimeter- 
wave diodes is junction capacitance, which must be kept 
as small as possible. Like a Schottky diode, a modified 
barrier diode has no minority charge storage capacitance, 
so the junction capacitance is quite low. The reverse-bias 
capacitance of a modified barrier diode is determined 
primarily by the dielectric constant of GaAs and the thick- 
ness of the intrinsic layer and has less variation with bias 
than Schottky diodes. The typical junction capacitance of 
a modified barrier diode with an active area of 40 /xmr is 
only 25 femtofarads at a bias of zero volts, and with an 
effective series resistance of about 20 ohms, these diodes 
work well in detector applications at frequencies up to 110 



GHz, 

The modified barrier diodes are remarkably rugged de- 
vices. Physical ly, they have good beam-lead pull strengths 
of over 5 grams and a passivation that allows them to be 
used In nonhermetic packages. Electrically, they can reli- 
ably operate at high current densities; the diodes used in 
the millimeter-wave mixers described in the article on page 
22 are rated to operate continuously at 16 dBm of LO power, 
which corresponds to average currents through the diodes 
of over 16 mA. These diodes are also relatively insensitive 
to damage caused by electrostatic discharge (ESD). 

The fact that modified barrier diodes are available with 
integrated resistors and capacitors enhances their useful- 
ness to the design engineer. Integrating resistors and 
capacitors with closely matched diodes can, in many cases, 
improve the performance of a product and lower its cost 
by decreasing parts count and shortening assembly times. 
The development and production of these modified barrier 
diode circuits provides the millimeter- wave designer with 
a new tool for the design of detectors and mixers up to 110 
GHz. 
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Diode Integrated Circuits for Millimeter- 
Wave Applications 

GaAs diode integrated circuits based en metal - 
semiconductor (Schottky) or modified barrier diodes have 
now extended the operating frequency range of small-scale 
ICs beyond 1 00 GHz. These circuits, which form the basis 
for many of HP's new millimeter-wave instruments, are 
useful for nonlinear and frequency-translation applications . 

by Mark P. Zurakowski, Domingo A. Figueredo, Scott S. Elliott, George A. Patterson, William J. Anklam, 
and Susan R. Sloan 



GALLIUM ARSENIDE [GaAs] has become the ma- 
terial of choice for building devices or integrated 
circuits for operation at frequencies higher than a 
few gigahertz. Several manufacturers, or foundries, a re now 
offering design and processing services for small-scale to 
medium-scale integrated circuits on GaAs operating to fre- 
quencies as high as 10 GHz and HP has developed a tech- 
nology for the design and fabrication of diode integrated 
circuits on GaAs that operate to above 100 GHz* These 



circuits contain no linear gain elements such as transistors; 
instead they are composed of diodes, resistors, capacitors, 
and conductive transmission lines. Their design makes 
them very useful for signal detection, mixing, multiplica- 
tion, sampling, power limiting, and other frequency-trans- 
lating or nonlinear circuits in the rnillirneter-wave fre- 
quency range. 

The millimeter-wave region of the electromagnetic spec- 
trum is generally considered to span the frequency range 



14 HEWLETT PACKARD JOURNAL NOVEMBER 1966 



)Copr. 1949-1998 Hewlett-Packard Co. 




Fig, 1. Scanning electron micro- 
scope {$EM) photomicrograph of 
a GaAs Schottky barrier diode in- 
tegrated circuit 
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of 26 GHz to about 300 GHz. a region in which the 
wavelength of electromagnetic radiation is of the order of 
a few millimeters in free space. This region is rapidly gain- 
ing importance for uses such as high-speed digital com- 
munication, high-resolution radar, military guidance sys- 
tems, and radio astronomy. Unfortunately, circuit elements 
such as transistors are available only for the very low end 
of this frequency band and even small-scale integrated cir- 
cuits have been virtually unobtainable. 

Millimeter-wave frequencies are so high that microwave- 
type hybrid circuits are difficult to construct Even the 
smallest of elements or bond wires t:an be a significant 
fraction of a wavelength long and therefore can no longer 
be modeled as a lumped element. The resulting parasitic 
capacitances and inductances can completely alter the elec- 
trical performance of a hybrid circuit, rendering it unre- 
peatable, unreliable, and sensitive to its environment. In 
addition, some applications require symmetry and phase- 
matching, which are very difficult to achieve in hybrid 
circuits. Hence, most designs at millimeter- wave frequen- 
cies have been done in metal or dielectric waveguide using 
expensive machined transitions to do the signal processing. 

To date, the most important signal processing circuits 
for millimeter-wave applications are frequency-translating 
circuits — those circui Is used to impress or multiplex lower- 
frequency information on a millimeter-wave carrier signal, 
or to shift the information from the mi lit meter-wave band 
down to a lower frequency w T here sophisticated signal pro- 
cessing such as amplification or filtering can be done. These 
translation functions can be handled very elegantly using 
diode integrated circuit technology. 

A diode integrated circuit is composed of one or more 
diodes in combination with resistive strips, conductive 
lines, and capacitive elements patterned in accordance 
with a set of design rules. Two layers of metallization are 
presently available, allowing the realization of non planar 
circuit geometries. The completed component is a small 
pellet of GaAs with beam leads which allow h it'll -stnmglh. 
low-parasitic bonding to a waveguide or substrate, A photo- 
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Fig, 3. Typical Schottky diode IV 
characteristic The straight line il- 
lustrates classic behavior. The de- 
viation from the line for actual de- 
vices is caused by leakage cur- 
rent for low -current levels and by 
series resistance for high-current 
levels. 



micrograph of a sample chip is shown in Fig. 1 . 

Schottky Barrier Diodes 

All of the circuit applications mentioned above require 
one or more nonlinear circuit elements that operate at mil- 
limeter-wave frequencies. GaAs Schottky barrier diodes are 
a good choice for such high-frequency operation because 
pi the higher velocities at which electrons travel in GaAs 
combined with the low charge storage effects exhibited by 
lhe.se diodes. 

A Schottky barrier is formed whenever a metal makes 
contact with a semiconductor surface. Ideally, equalization 
di I he metal and semiconductor (n type) Fermi levels causes 
the transfer of electrons from the semiconductor into the 
metal, depleting the semiconductor of mobile charge car- 



riers and generating a positive space charge region. This 
causes the energy bands in the semiconductor (Fig. 2) to 
bend up at the interface, which creates a barrier potential 
dependent on the work function of the metal. Besides this 
fundamental process, if there are large numbers of surface 
states present on the semiconductor, then instead of elec- 
tron transfer to the metal, it is electron transfer to the surface 
states that causes carrier depletion in the semiconductor 
and band bending near its surface. Fig. 2a shows the nega- 
tively charged surface states as minus signs at the metal/ 
semiconductor interface. If the density of surface states is 
sufficiently large, the barrier potential becomes virtually 
independent of the metal's work function. GaAs falls into 
this category; the surface states are generated by damag' 
caused by the metal deposition. 



Schottky Metal 




Proton-Isolated GaAs 
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Fig, 4. Single Schottky diode, (a) Top view, (b) Cross section, (c) Equivalent circuit. R f repre- 
sents the nonlinear, exponential part of the circuit r R s the series resistance, C { the junction 
capacitance, C the parasitic capacitance, and L p the beam lead inductance. 
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An externally applied voltage will alter the barrier poten- 
tial for electrons owing from the semiconductor into the 
metal. Under forward bias, the conduction band is bent 
upward, allowing electrons to flow* freely into the met a J 
(Fig. 2b). Under reverse bias, the SchottJty barrier potential 
prevents electron flow from the metal to the semiconductor 
(Fig, 2c). The current flow is governed by thermionic emis- 
sion of electrons from the semiconductor into the metal 
with a current -versus- voltage relationship described by: 



Energy (eVj 



J = JJexp(qV/nkT)-l] 



m 



where J is the current density in A/cm , J CJ is the reverse 
saturation current density* T is the temperature in K, q is 
the electron charge in coulombs. V is the applied voltage, 
and k is Bo i tern aim's constant. The ideality factor n, typi- 
cally 1*1 to 1*2, accounts for tunneling currents and other 
onnideal behavior, J is given by the expression: ] = A*T J 
expt-q^/kT) where ^ is the barrier height and A* is 
Richardson's constant given by A* = 4Trqm*k 2 /h 3 where 
m* is the effective mass of the charge carrier (in this case 
an electron! &nd h is Planck's constant. A typical I-V charac- 
teristic is shown in Fig. 3 along with a plot of the ideality 
factor n as a function of bias, 

A single Schottky diode and Its equivalent circuit axe 
shown in Fig. 4, The nonlinear I-V characteristic of equa- 
tion 1 is modeled by a voltage dependent nonlinear resistor 
Rj.The junction capacitance Cj of the diode depletion region 
is voltage dependent and for uniformly doped epitaxial 
layers is given by: 



and 



c = tyVi-cv/<M 



C u = AVq£N d /2<^ 



where ^ is the built-in potential [Fig, 2), C Jn is the zero-bias 
capacitance. A is the area of the junction, N l( is the doping 
of the epitaxial layer, and e is the dielectric constant of the 
semiconductor. The parasitic elements are the series resis- 
tance Rj and the beam-lead/Schottky-ohmic inductance L p 
and capacitance C p . Major contributors to R, are contact 
resistances, the undepleted epitaxial layer resistance, and 
metal layer resistances, Minimizing the effects of the par- 



Ferrrn Level 




Depth into Substrate 

Fig. 5. Energy band diagram of a modified barrier diode at 
zero bias 

as i tic elements is extremely important since, with C r they 
set an upper limit to the frequency of operation. 



Modified Barrier Diodes 

In many product applications, high-speed, variable-h-n 
rier-height diodes are very desirable because of the added 
flexibility they provide in choosing optimum response, hiris 
conditions, and power- handling capability. In the ca 
standard GaAs Schottky diodes, this flexibility is almost 
nonexistent because the barrier height is virtually indepen- 
dent of the metal system used for the Schottky contact. To 
address the variable-barrier needs of HP's millimeter- vl eve 
customers* a new device called a modified barrier diode 
has been developed that allows a designer to choose diode 
barrier heights between 0.2 eV and T2 eV\ Production of 
modified barrier diodes requires a technique called miil'f u 
lar beam epitaxy (MBE, see article on page 4) to grow ex- 
tremely thin, highly doped la vers. The major advantage pro- 
vided by MBE is that the number of fixed charges Intro- 
duced into (he layer determining the barrier height is vrrv 
accurately controlled by this layer's doping and thickness. 

The energy band diagram for a modified barrier diode is 
triangular in shape as depicted in Fig, 5. it is very Similar 
to the diagram for a Schottky diode and, with good approx- 
imation, Eq nation 1 can also be used for modified barrier 
diodes. All that is needed is an equation relating doping 
levels and layer thickness to barrier height. The epitaxial 
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Fig, 6. Cross section of modified 
tamer diode fabricated using 
molecular beam epitaxy (MBF) 
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layers forming the modified barrier diode and the variables 
I hat will be used in the following simple theory of operation 
are shown in Fig. 6. Gauss' Law is applied using the follow- 
ing assumptions: 

h Depletion approximation. That is, all space-charge reg- 
ions are fully depleted (no mobile charge carriers) and 
have abrupt boundaries- 
Infinite planes of charge, 

All dopings are uniform for simplicity of integration. 
m Charges are not discrete but spread throughout the layers , 
The bottom layer is far from the barrier and does not 
enter into the calculation. 
The last assumption allows the i-V equation for the mod- 
ified barrier to have the same form as for the Schottky 
barrier. 
Gauss' Law: 



Metal 



i 



E # nds = 






yields E = cr/2 for an infinite plane of charge where E is the 
electric field, n is the normal to the plane, a is the sheet 
charge density, and e represents the semiconductor dielec- 
in< constant. 

For the structure shown in Fig* 6 t there is a continuum 
of charge planes. Thus: 

E = (2qN H t - 2qN a x){l/2t] for O^x^t 

E = (2qN a t + 2qN d x)(l/2 £ ] for -t'^x^O 

where N a is the volume density of acceptor ions, N d is the 
volume density of donor ions, t is the thickness of the p + 
layer, and I' is the thickness of the n+ layer. 

Integrating E from — t r to t and using t'-t(N a /N d ) from 
charge balance, we obtain: 



P+ 






Intrinsic 



-<h, = (1/2.0 [N, + (N*/N d ]]t 2 



(2) 



Ffg, 8, Cross section of modified barrier diode. Equivalent 
circuit and element definitions same as for Fig 4. 

This equation gives the approximate barrier height of a 
modified barrier diode and can be substituted directly into 
Equation 1 for the Schottky diode I-V characteristic. 

Comparison of a measured I-V characteristic with a curve 
predicted by the simple theory presented above is show r n 
in Fig. 7. The reverse characteristics are not modeled well 
by the simple theory for low-barrier diodes because the 
simple theory neglects the effect of the electrons in the 
intrinsic region on the barrier height. In reality the barrier 
height is lowered slightly in the reverse bias direction and 
raised slightly in the forward bias direction. This can be 
taken into account using Stanford University's SEDAN 
(SEmiconductor Device ANalysis) modeling program. 
SEDAN is a one-dimensional numerical simulator which 
solves simultaneously the continuity equations and Pois- 
son's equation for the electrostatic potential and carrier 
concentrations as a function of space and time. The third 
I-V characteristic in Fig. 7 is predicted by SEDAN, Notice 
that it predicts much higher leakage current in the reverse 
direction and more closely follows the actual 1-V charac- 
teristic of the measured diode. The I-V characteristics for 
the simple theory, SEDAN, and the measured diodes all 
agree for the higher barrier heights since the effect of the 
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Fig. 7. I-V characteristic for mod- 
ified barrier diode Solid color 
curve is that predicted by the sim- 
ple theory discussed in the text, 
dashed coior curve is that pre- 
dicted by Stanford University's 
SEDAN modeling program, and 
the sofrd black curve is measured 
data 
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Fig. 9. Schottky barrier diode fabrication, (a) Nonconductive 
areas formed by proton isolation (hydrogen ton implantation}. 
(b) Ohmic contacts formed by etching top layer, depositing 
metal in holes, and at!oytng metal to bottom layer (c) The 
steps for (b) are used at the same time to form the bottom 
plates of the capacitors in the proton-isolated regtons (d) 
The Schottky contact is formed using a irimetal process of 
titanium, platinum, and gold 



barrier lowering is a much smaller portion of the intrinsic 
barrier height. 

Fig. 8 shows a cross section of a modified barrier diode. 
Its corresponding circuit model is the same as that shown 
in Fig. 4 for a standard Schottky barrier diode, but io this 
case C, is no longer dependent on the applied voltage, 
except for the lowest barrier diodes. For a modified barrier 
diode, G t depends only on the area and intrinsic layer thick- 
ness. 

Diode 1C Fabrication 

Diode integrated circuits are fabricated using standard 
photolithographic techniques. The substrates are semi -in- 
sulating gallium arsenide wafers with reasonably low de- 
fect density. In the case of Schottky diode ICs, Uqutd-phase 
epitaxy \\.V\-': la used to pow single-crystal layers oi doped 
GaA.s. The first layer consists of about 300 nm of heavily 
doped n-type GaAs (about 10 ie /cm 3 ) to serve as a bottom 
conducting surface for the diodes. The top layer is designed 
to minimize the zero bias junction capacitance while main- 
taining reasonably low series resistance. We typically 
choose this layer to be about 300 nm of 10 I7 /cm 3 n-type 
material. 

The first step in the process is to select and isolate regions 
of the substrate to be used for diodes. The technique of 
proton isolation, or ion implantation with hydrogen ions, 
is used to destroy the conductivity of the epitaxial layers 
in all areas of the wafer except where diodes are to be 
located (see Fig. 9a). Holes are then etched through the top 
layer to the heavily doped n -<- layer aoti metal is deposited 
in these holes. The metal is alloyed through a sintering 
step to form a low-resistance (ohmic) contact to the bottom 
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Fig. 10* (a) Cross section of integrated structure (b) Struc- 
ture of (a) with polytmtde insulating layer and beam leads 
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Fig. 11. Photomicrograph of com- 
pleted diode iC chip mounted up- 
side down on a microwave carrier 



layer, and thus to the bottom of the diodes (Fig. 9b). 

The same etch and deposition steps used to form the 
ohmic contacts are also used to form the bottom plates for 
capacitors as shown in Fig. 9c, The GaAs etch is done in 
a proton isolated region of the substrate, so that the metal 
does not form an ohmic contact in this case. Resistive strips 
of tantalum nitride are formed next on the semi -insulating 
region of the GaAs by field deposition and selective etching, 
The resistivity of these films is nominally 50nC The 
capacitor dielectric, consisting of about 10U nm of silicon 
nitride, is patterned on top of the metal bottom plate to 
yield a nominal capacitance density of 0,67 fF//u.irr. 

The critical Schottky barrier contact Is now formed. A 
trimetal system of titanium-platinum-gold is used to form 
a s labia and highly reliable contact that exhibits low para- 
sitic resistance and a low potential for electromigration. 
The titanium adheres well to GaAs and does not easily 
form alloys w r ith it The gold is highly conductive and 



extremely resistant to corrosion or electromigration, but it 
does interdiffuse easily with both titanium and GaAs. 
Therefore, a thin layer of platinum is used to separate the 
titanium from the gold and adheres very well to both. 

Photoresist is first applied to the whole wafer and win- 
dows are defined where the metals will eventually be lo- 
cated, The three metals are applied sequentially in the 
same evaporation system, and the unwanted regions of 
metal are lifted off by dissolving the photoresist under- 
neath, This process allows very good control of metaJ strip 
geometries down to around 800 nm, which is near the 
low r er limit for optical lithography. A cross section of the 
resulting Schottky contact finger is shown in Fig. 9d, The 
same metal deposition and lift-off procedure is used to 
form the top plates of the capacitors and to make connec- 
tions between all of the various circuit elements (see Fig, 
10a). 

The entire circuit is passivated and planarized by a 1-fim 




Fig. 1 2. Photomicrograph of single 

beam-lead diode mounted on a 
copianar waveguide structure for 
nonlinear and linear testing at 
microwave frequencies. 
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Fig. 13. Diagram of fmttne test 
package used to test beam -lead 
diodes at mtilimeter-wave fre- 
quencies 



thick coating of polyimide which Is spun on and cured. 
Contacts to the outside world are made by selectively etch- 
ing vias (holes) through the polyimide to the desired metal 
areas. Added areas of metal are photolithographically 
formed on tap of the polyimide and plated to several micro- 
meters thick with gold to form beam leads or a second 
contact metal layer [see Fig. 10b). 

The wafer is mounted circuit-side down on a quartz wafer 
and backlapped carefully to a thickness of 50 /itn. A back- 
side alignment to the patterned front surface is performed 
using an infrared aligner to expose photoresist which de- 
fines etch channels. A chemical etchant is then used to 
separate the chips and expose the gold beams. A completed 
chip, mounted upside down on a microwave carrier, is 
shown in Fig. 11. 

Fabrication of modified barrier diode integrated circuits 
is quite similar except for the addition of a mesa etch to 
define the diodes and replacement of the Schottky metal 
deposition with a low-resistance top metal deposition and 
alloy, 

GaAs Diode IC Characterization 

During fabrication the wafers are monitored by periodi- 
cally checking standard test patterns included on each 
mask set. The patterns are probed and dc-to-1-MHz tests 
are done as the IC is fabricated, From these tests we can 
evaluate the quality of the circuit elements at various steps 
in the process. Upon completion of the fabrication process 
each diode IC is tested with an automated test system which 
screens the ICs for specified dc parameters. 

Some applications require testing of parameters at micro- 
wave and millimeter-wave frequencies, This testing is more 
difficult and is usually done on a sample basis. A fixture 
useful for both linear and nonlinear testing at microwave 
frequencies is shown in Fig. 12 in which a single beam-lead 



diode is mounted on a coplanar waveguide structure. A 
network analyzer can be used to measure the four s-param- 
eters of this configuration. Tins data can be compared to 
that predicted by a linear model of the device to find the 
element values in the model that most closely fit, Nonlinear 
parameters such as harmonic generation efficiency or series 
power limiting can also be measured. 

A fixture that has proven useful for millimeter- wave fre- 
quencies is shown in Fig. 13- This circuit consists of a 
fin line substrate in a waveguide housing. The finline is 
composed of a thin film of gold on a fused silica or sapphire 
substrate that tapers to a narrow gap at the center, It pro- 
vides a broadband, low-VSWR transition from Lhe wave- 
guide mode to a finline mode. A diode or diode integrated 
circuit can be mounted at the neck of this circuit as shown, 
A^ain. full two-port s- parameter measurements can be 
made over the waveguide band using a network analyzer, 
or nonlinear measurements can be performed. 

Conclusion 

Hewlett-Packard has developed technologies for fabrica- 
tion and design of custom diode integrated circuits that 
operate beyond 100 CHz. These ICs form the basis for many 
of HP's new millimeter-wave instruments, but are not avail- 
able as separate products. 
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Unbiased Subharmonic Mixers for 
Millimeter-Wave Spectrum Analyzers 



by Robert J. Matreci 



EXTERNAL WAVEGUIDE MIXERS are used to extend 
the frequency range of a microwave spectrum ana- 
lyzer beyond its frequency limit for a coaxial connec- 
tor input. Since a microwave analyzer's local oscillator 
(LO) has a typical range of 2 to 6,2 GHz, the mixers must 
operate on higher-order harmonics of the LO. Tn maintain 
the analyzer's amplitude measurement accuracy, individual 
calibration and flat frequency response of the mixer are 
mandatory. If there are no electrical or mechanical adjust- 
ments to be made as a function of frequency (e.g., bias 
current or baekshort position), then wideband and auto- 
mated spectral measurements are possible, High burnout 
level and ruggedness are required for reliability 

Even-Harmonic Mixer Design 

The HP 11970 family of waveguide mixers 1 is designed 
for use with the HP 8566B Spectrum Analyzer or the HP 
70000 Modular Measurement System (Fig. 1J. Six wave- 
guide bands from 18 to 110 GHz are covered. 

In the schematic of Fig. 2, a microstrip diplexer separates 
the analyzer's LO from the intermediate frequency (IF) mix- 
ing product. The RF signal enters via a tapered waveguide 
section which is terminated by the diodes. The anti parallel- 
pair diode configuration 2 gives improved conversion toss 



over single-dtode circuits since the conductance waveform 
symmetry produces mixing products caused only by even 
harmonics o? the LO- Thus, less energy is lost to unused 
harmonics. The back4o-back diode connection also in- 
creases electrostatic voltage damage resistance. 

The diodes, Fig. 3, are a monolithic GaAs pair* produced 
by HP's Microwave Technology Division. A 250-jmvdiam- 
eter loop connects the two diodes and contains the even- 
order current harmonics. The zero-bias capacitance C ]fl is 
13 fF per diode and the series resistance is 240, A poly- 
imide layer passivates the diodes and contributes to the 
excellent beam-lead pull strength of 15 grams, 

A key factor in realizing constant conversion loss versus 
frequency is the control of the odd harmonica of the LO 
produced in the diodes. Although the conductance (mix- 
ing) waveform contains only even harmonics of the LO, 
the input current contains only odd harmonics, if these 
odd harmonics propagate away from the diode's location 
and then reflect back from any internal or external element, 
the desired even harmonic can be severely weakened by 
the reentering signals. In previous harmonic mixers, these 
destructive interference effects led to numerous spike- like 
increases of conversion loss versus frequency, and a change 
in bias current or backshort position was required to move 




Fig. 1 . Four HP 1 1970 waveguide 

harmonic mixers connected to the 
HP 70000 Modular Measurement 
System 
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I he reconstruction phase away from cancellation. 

For the HP 11970 Mixers, odd-harmonic reflection from 
the diplexer and the LO source system are eliminated by 
the short produced at the diodes by the lumped first ele- 
ment of the 6,3^GHz low-pass filter (LPF. see Fig, 2), This 
metal-insulator-semiconductor (MIS) capacitor is also in 
series with the RF signal, so it must maintain its shorted 
condition throughout the RF band so that the entire RF 
voltage can appear across the diode. The capacitor's pad 
dimensions are 75 fxm by 75 /im and its silicorwlinxide 
(SiCy dielectric layer thickness is 440 nm, These small 
sizes push the capacitors self -resonance frequency to more 
than 110 GHz. 

The waveguide 37-GHz high-pass filter [Fig. 4) follows 
a symmetrical exp(co$ 3 #) taper, 4 and prevents odd LO har- 
monics from reflecting from the out-of-band source mis- 
match of the system being measured. Early in the develop- 
ment project, a rectangular-to-double-ridge taper was tried, 
but the large variation ifl cutoff frequency along the taper 
was itself a source of odd-harmonic reflection. Other espe- 
cially troublesome sources are components such as bends, 
twists, and even the terminated arms of directional cou- 
plers, all of which present significant reflections at or below 
their cutoff frequency. 

The high-pass filter's width taper is integrated into a 
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Fig, 4. A special taper (left) m the waveguide (right) acts as a 37-GHz high-pass fitter and 4:1 
impedance transformer, The taper is formed by numerically controlled milling machines accord- 
ing to equations describing the taper's dimensions 
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Fig. 7. Each HP 1 1970 Mixer is individually calibrated for conversion loss across its full fre- 
quency band. Data is entered into the mainframe spectrum analyzer to provide corrected 
amplitude measurements in the miiitmeter-wave bands 



modified exponential waveguide height taper, 5 which in 
a minimum length lowers the waveguide impedance by a 
factor of 4 to match the time-average impedance of the 
diode. 

The diodes are thermosonically bonded across the re- 
duced waveguide opening height (0.5 mm), one terminal 
to the bottom wall and the other onto the MIS capacitor 
attached with epoxy adhesive to tha top wall. 

The relatively large local oscillator drive level required 
(14.5 dBm to 16 dBm] is a result of the high diode barrier 
height [V r at 1 mA = 0,73V) and the need for a large con- 
duction angle. Fig. 5 shows that only conduction angles 
greater than 140 degrees will avoid nulls in the eighth or 
lenth harmonic of the conduction waveform. 

The eighteenth harmonic is required to reach the 75-to- 



110-GHz band, but the conduction angle null occurs close 
to the standard 14.5-dBm-to-16-dBm LO power range. The 
conduction angle can be altered by using more LO power 
or reducing the barrier height of the diodes. Since the vari- 
ous harmonic mixers are required to be compatible with 
the LO drive available In the spectrum analyzer system, 
nedi urn-barrier diode [V f at 1 mA = 0.28V) described 
in the articles on pages 10 and 14 was developed. This 
diode provides the same type of flat response and insen- 
sitivity to LO variation at 110 GHz (n = 18) as the hrgh- 
barrier diode does at 18 GHz (n = 6). 



Performance Measurements 
Conversion losses tot each waveguide band from 18 to 
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Fig. 9* A-band spectrum analyzer 
display for older 2.05-GHz IF har- 
monic mixers. Note that the image 
responses (N - harmonics) inter- 
leave with desired N + harmonic 
responses. 



60 GHz are shown in Fig. 6. To demonstrate the improved 
flatness and sensitivity of the unbiased HP 11970 Mixers, 
the plot for a single-diode biased mixer is shown for the 
uases of bias optimized at each frequency and a fixed bias 
optimized at only the center of the band. An example of 
the calibration chart that accompanies each HP 11970 
Mixer is shown in Fig. 7 for the 75-to-110-GHz band. 

The suppression of odd-harmonic mixing products and 
the choice of a relatively low IF frequency [321 MHz versus 
2.05 GHz used previously) aids in reducing the spectrum 
analyzer's display clutter. Because no RF preselectors are 
available for these bands, responses from harmonics adja- 
cent to the one desired also appear, The anti para II el -pair 
configuration suppresses the adjacent odd responses and 
the low IF frequency causes the wanted N-h harmonic re- 
sponse and its N - harmonic image response to be grouped 
closely with no interleaving. The resulting full band dis- 
play caused by a single input frequency has a certain 
"order" to it (see Fig. 8). Fig. 9 shows the same display for 
the earlier 2.05-GHz-IF mixers. 

A c k n owl ed g me n t s 

Kudos to C.C. Chang, Gary Roberts and Dennis Lynch 
for the original high-barrier diode. The medium-barrier 
diode group, whose articles appear in this issue, did an 



excellent and fast job. Vonnie Anderson aided with com- 
puter modeling, Jimmie Yarnell designed the mechanical 
package and processes (besides creating valuable artists' 
renditions], Helen Exicson and Dorothy Yanez handcrafted 
the prototypes, Vinh-Khuong Nguyen, Ken Lew, and Robert 
Chariton molded the efforts into a producible product with 
automated testing. Project managers Frank David and Roger 
Stan cliff reviewed the design and Roger Kahn and Ron 
Rausch marketed the devices. 
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Horizonte, Brazil He and his wife have two sons 
When not an work, he keeps busy with home im- 
provements and woodworking and likes camping 
with his family, skiing, and amateur radio 
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Sig Johnsen is applications 
engineering manager for 
GaAs diodes. PETs. and 
silicon bipolar devices at 
HP's Microwave Technol- 
ogy Division He has held 
a number of engineering 
positions since joining HP 
m 1979 He's a 1977 
graduate of the Rochester 
Institute of Tecnno log y (BS physics) and was a test 
equipment engineer at Genera! EJectric Aerospace 
before coming to HP Bom in White Plains New 




Eric R. Ehlers 

I Ehlers came to H 
1979 and has been an ap- 
plications en g 
GaAs FETs and diode in- 
tegrated circuits He's cur- 

an R&D enq 
| working on millimeter- wave 
products Born in San 
Diego. California, he 
studied biology ai the 
California Institute of Technology (BS 1973) and 
worked as a biochemist at Scnpps Clinic He con- 
tinue a his studies at the University of California at 
Santa Barbara (BSEE 1 977 and MSEE 1979) He s 
the author or coauthor ot four papers on microwave 
measurements and circuit design Eric and his wife 
nve m Santa Rosa. California and have two sons 
He's nvoived in youth soccer and leaches courses 
at a local CONege Other interests mdude karate, 
hiking, and science fiction 



Gray 

A New York City native. 
Doug Gray completed his 
studies for a BSEE degree 
from the Polytechnic Insti- 
tute of New York in 1960 
, and for an MSEE degree 
from Stanford Uriiversily in 
1962 With HP since 1960, 
he was R&D manager for 
the Microwave Tech no logy 
Division before h i s recent retiremeni He has con- 
tributed to the development of microwave mod- 
ulators, signal generators, and network analyzers 
and sweepers, among other products He has writ- 
len several papers on microwave components and 
is named author on a patent for a Gunn oscillalor 
design He and his wife live in Santa Rosa. Califor- 
nia and nave lour children One of his sons is an 
HP engineer Sailing is his favorite recreational ac- 
tivity He sails his 42-foot cutter around San Fran- 
cisco Bay and recently sailed to Hawaii He also 
enjoys photography and scuba diving 
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Susan R. Sloan 




With HP since 1965. Susan 
Sloan is an R&D engineer 
at the Microwave Technol- 
ogy Division and has con- 
tributed to the development 
ol fabrication processes lor 
high-frequency diodes 
She was born in Madison 
Wisconsin and aitended 
The University of Wisconsin 



Mark P. Zurakowski 

I Microwave Technology 
^^^B I Division. Mark Zurakowski 

^^f^~*^| I has held a number of en- 
^L t-jfcfl I 91 neering positions at HP 
IP 4| I since 1976 His most recent 

^F WT^^H proiect work has been on 
J V GaAs diode integrated cir- 

■ cuits He is named inventor 
. 1 for a patent application on 
a modified barrier diode and is author or coauthor 
of three papers related to device design and pro- 
cessing, semiconductor physics, and millimeter- 
wave devices Mark has a 1978 BS degree in 
mechanical engineering and materials science 
from the University of California at Davis He also 
earned an MS degree *n materials science and en- 
gineering from Stanford University in 1 9B2. Born m 

j. Michigan, Mark ks a resident ol Santa 
Rosa. California. He's marned and has three Chil- 
dren He enpys traveling, camping, and hiking wilh 
his family and likes woodworking and sports 



William J. Anklam 

■ Bill Anklam was born in 
I Columbus, Ohio but grew 
W-+--1 ' --- I up m Muscatme r Iowa He's 
J ? I a graduate of Iowa State 
M j* ^J I University (BSEE 1 982) and 
^k ^-B I of Stanford University 
V ^^m I (MSEE 1963 and EE1986) 
^r ^^ W% ^H Josephson devices were 
* the subject of his thesis re- 
search With HP s«nce 
1 985, he has been responsible for the design and 
characterization ol new devices and circuits for mil- 
limeter- wave applications He s coauthor of four 
papers on Josephson [unci ion fabrication and 
characterization and is interested in device 
physics, ulirafast phenomena, picosecond elec- 
tronics, and SU perconduct i vity A resident of Santa 
Rosa. California. Bill is a runner and likes back- 
packing, motorcycling, softball, golf, and mu#fc 
He's also a voracious reader 



Domingo A. Figueredo 

Domingo Figue'edo pined 
HP in 1981 upon getting his 
MSEE degree from the Uni- 
versity of California at Santa 
Barbara. Hie BSEE degree 
was awarded in 1979 by 
I the Florida Institute of Tech- 
nology His contributions at 
HP include work Oh bipolar 
transistors, on GaAs MES 
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FETS, and on modified barrrer diode technology. 

He's presently a project manager for millimeter- 
wave transistors. His work on planar doped transis- 
tors is the subject of a patent application and he's 
the author or coauthor of tour papers on microwave 
semiconductor device theory, processing, and ap* 
plications. His other professional experience in- 
cludes research on molecular beam epitaxy at the 
University of California. Bom m Barcelona, Ven- 
ezuela. Domingo now lives in Rohnert Park. Califor- 
nia. He's married and has live children. His leisure 
activities include weight training and jogging. 



Scott S* Elliott 

Scott Elliott «s the section 
manager for the develop- 
ment ot integrated diode 
circuits and other technol- 
ogy products at HP's Mi- 
crowave Technology Divi- 
sion. Born in Aberdeen, 
Washington, he studied 
electrical engineering at 
■ " 4~V ' £1 the University of California 

at Berkeley receiving his BS degree in 1969 and 
h is MS do g ree in 1 97 1 After worki ng for four yea r s 
on microwave components he continued his 
studies at the University of California at Santa Bar- 
bara and competed work tor a PhDEE m ^ 978. With 
HP since 1 978, Scott has been a development en- 
g : neer. a project manager, and a section manager. 
He was one of the developers of (he first surface- 
acoustic-wave resonators and low-loss filters to be 
used m commensal instruments. He has published 
or presented over 3D papers in the areas of lasers, 
microwave components and microwave acous- 
t-cs. He and his wtfe and two children live in Santa 
Rosa, California. 



George A. Patterson 

With HP since 1975. 
George Patterson was >n- 
strumental in determining 
the feasibility aF using MBE 
technology for devices for 
the Microwave Technology 
Division and is responsible 
for the continuing develop- 
ment of MBE. Before work- 
ing an MBE he was a pro- 
cess development engrneer for silicon and GaAs 
devices Bom in Mrneola. New York, he served in 
the U.S. Army Signal Corps and is an aJumnus of 
M i c h ig an Sta te U n i vers ity (BSE E T 969) . He al so a t- 
tended Stanford University and worked as a silicon 
process engineer at Fairchitd Sem i conductor Cor- 
poiaiFon before coming to HP, He is coauthor of a 
paper on using ion implantation for doping GaAs, 
A resident of Santa Rosa, California. George is mar- 
ried and has a son. His wife is an HP computer sys- 
tems administrator. He's an avid mountain climber 
and skier and has climbed most of the mountains 
in the Cascade Range in the western United States . 
He and his son climbed the east side of Mount Whit- 
ney, California last year, He's also a pilot and has 
nstrumem, commercial and multi-engine ratings. 
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Robert J. Mat reel 

Sf , | Bom in Chicago. Illinois, 

\&££-^z& . ' . J» Bob Matreci is an alumnus 
of Wichita State University 
{BSEE 1969) and the Uni- 
versity of Kansas (MSEE 
1973), He worked on avion- 
ics test equipment before 
™ a coming to HP <n 1-978, He 

*/7j/*XUO&!m c 0Rt r ■ bu ted to ih e Oe ve lop- 
Ufr^^^^Sm mem ol the HP 1 1 970 
Series Harmonic Mixers and is now a project man- 
ager tor milometer- wave components and systems 
H's work on a millimeter- wave component <s the 
subject of a patent application. Bob and his wife 
live in Santa Rosa, California and he's active in h>s 
church. He's a sailing enthusiast and was naviga- 
tion and communications consultant for Peter 
Bird's 1 962-83 solo row across the Pacific Ocean . 
He has also been finishing some work on his new 
house. 
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Bruce J. Hichards 

A section manager at HPs 
Knowledge Systems Labo- 
ratory, Bruce Richards has 
been responsible for sev- 
eral computer system sup- 
port tools, including Predic- 
tive Support software for 
the HP 3000 Computer He 
joined the Neely Sales Re- 
gion in 1976 and he?d vari- 
ous field support technical and management posi- 
tions before moving to his current job. A California 
native, Bruce was born in Long Beach and edu- 
cated at the University of Califormaat Irvine (BSEE 
1975). Bruce and his wife live m San Jose and are 
the parents of a new baby. An avid photographer, 
he also likes skiing and tasting and collecting wine, 




David B Wasmuth 

Dave Wasmuth is a soft- 
ware engineer and has 

worked mainly on HP Pre- 
dictive Support software 
SMice coming to HP in 1 984. 
+^^£mMf He worked in software irv 

JL^^T ^^B fegration and testing for the 
J|^B System Development Cor- 
poration and for the IBM 
Corporation before joining 
HP. Dave was bom m Dallas. Texas and graduated 
from Cat iforma State University at Chico with a BS 
degree «n computer science in 1984. A resident o 4 
Los Altos, California, he likes all kinds of sports. 






Craig M. Myles 



An HP 3000 Computer sys- 
tems specialist in HP's St. 
Louis, Missouri office. 
J — C ra ig M yles came to H P in 
t979 He specializes in 
solving MPE operating sys- 
tem problems, often writing 
software and hardware 
diagnostics He was the 
MPE specialist for the de- 
velopment of AJDA. He was born in Berea, Ohio. 
and graduated from The Principle College with a 
BS degree in physics in 1 979 after serving lor four 
years in the U S Navy Craig and his wife live in 
Si, Louis. His outside interests include scuba div- 
ing, bird-watching, swimming, and astronomy. 




Lynn R. Slater, Jr. 

' Born in El Paso, Texas, 

flfw ^^^ T Lynn Slater at tended Tex as 
flf J^flAk A A & M University and re- 
.^M_ ^mJ^L cyive d his BS degree in 
^■Br^^fL^J compute rscrence in 19B3. 
'""J jfiiL | ~S ^s also gained program- 
mm E miri 9 experience by work- 

_ u $f ^ n Q fo? several orgamza- 

tions whilefinishing nis de- 
// gree. With HP Since 1983, 

he has contributed to an internal HP software de- 
sign and development environment and is now 
project leader for the AJDA expert system, He's a 
member of IEEE and ACM , A resident of Fremont, 
CaJifornia Lynn is active in local politics. His out- 
side interests include target shooting and motor- 
cycling 



Keith A, Harrison 

Keith Harrison was born in 
Burton On Trent, Stafford - 
| shire. United Kingdom. He 
studied mathematics at 
Pembroke College, Oxford 
University, and graduated 
I with BA and MA degrees in 
1975 Before coming to HP 
in i 980 he developed com- 
puter-aided design tools 
and wrote data processing software for the King- 
dom of Saudi Arabia. At HP he has contributed to 
h? NLS/3000 and the AlDA expert system and now 
develops ways to delrver knowledge-based ser- 
vices to customers. Keith is a resident of Woking- 
ham. Berkshire. U.K. His outside interests include 
compiler design, language design, and playing re- 
corder and classical guitar 
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42 !Z: Schooner _ 

Diane M Aha rt 

• 1 Bom in hion New York 
indedthe 

-■•■ 

BS degree in computer sci- 

23 been 

wrth HP since 1 965 and is 

based knowledge acquis^ 
lion and delivery tod, She 

also maintains and supports the Schooner expert 
Before coming to HP she WGffod at .- - 

eywelijnc artificial intelligence laboratory Diane 

Wes m Foster City, California ano enjoys skiing and 

camping 



R. Michael Young 

Michael Young was bom in 

Salinas. California and 
worked as an actor and 
teacher before studying 
computer science at 
California Stare University 
at Sacramento (BS {984] 
He came to HP the same 
year and has contributed to 
the development of the 
Schooner expert system and to Common Lisp for 
HP Precision Architecture He's currently on a leave 
of absence from HP To complete a master's degree 
in computer science from Stanford University 
Michael's professional interests include theories of 
automated reasoning nonmonotonic logics, and 
modeling ol processes He lives m Sunnyvale, 
California and enjoys windsurfing and playing 
squash 




Brian f. Button 






Brian Button was educated 
nivecstty ot Caiifor- 
Serfceiey and ro- 

ng ana 



same year he /. - - 
9 system mKirocode and 

23 a project seader for the Schooner expert 
system He's now a product manager for the mul- 
ti vendor portion of the network support program 
and is interested m computer tanguages and 
knowledge representation A native of California. 
he was bom in R chnond and BVB8 in San Jose 
Recently married, he s active in the Junior 

::^r of Commerce and likes skiing and back- 
packing 
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Roy M. Vandoom 

^^^^ ^ m ' ~ Bom in Edmonton 
^^AMi Canada, Roy Vandoom 

^fc Jk- studied mathematics and 

flf]A computer science at San 

FZ |v ^B ' se State University He 

^"^ I completed work for his BA 

L jfcjO> W degree m 1 976 and tor his 

™ MS degree m 1980 He has 
been wrth HP since 1980 
and ^s an R&D project man- 
ager for advanced diagnostic tools at tne Knowl- 
edge Systems Laboratory. He has worked on I FT 
and on a software defect tracking system He's also 
a member of the American Association for Artificial 
intelligence. A resident ofSan Jose. California, Roy 
is a scuba diver and helps to I rain other divers 
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George ft. Gotisohalk 

R&O software engineer 
George GottscnaJk has been 
since T98A He has 
contributed to tne develop- 
mem of IFF and conflmues to 
work pn other software sup- 
oon tools George * . 

-marsu. Japan and 
attended San Jose State 
-■sity. He studied 
mathematics and computer science and received 
his BA degree in 19B4 He snow living m Palo Alto, 
California and enjoys hiking, backpacking, and 
drawing cartoons He says he's also working on an 
epic poem, about IPT 
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Robert I. Marcus 

Bob Marcus was born in 
Brook iyn, New York and 
studied mathematics at ihe 
Massachusetts Institute of 
Technology (BS 1 963 J and 
New York University (PhD 
A I 1972) He was a university 

^^L "\y^^ professor before coming to 

^k %^^^ HP m 1964 and recently left 
IK ^ft the company An R&D soft- 

ware engineer, he worked on a configurator for the 
HP 3000 Computer and on an interactive informa- 
tion structuring tool He has published five papers 
on stochastic partial differential equations and 
three papers on computer science and is a 
member of AAAI, ACM, IEEE, and the Society lor 
Industrial and Applied Mathematics Bob and his 

• : -n the Sean le , Washing I onareaand have 
one child 



NOVEMBER t966 HEWLETT-PACKARD JOURNAL 29 



)Copr. 1949-1998 Hewlett-Packard Co. 



Predictive Support: Anticipating Computer 
Hardware Failures 

Predictive Support software for the HP 3000 Computer lives 
on the customer's system and notifies appropriate 
personnel of impending failures. 

By David B, Wasmuth and Bruce J. Richards 



IN THE BUSINESS of customer support it is very desirable 
to identify and repair weak or failing products before 
they degrade to the point where the Impact on system 
users is significant. To this end, Predictive Support soft- 
ware for the HP 3000 Computer was created. 

Briefly stated, Predictive Support is an HP 3000 software 
product that lives on the customer system and periodically 
examines the soft error rate of the various system compo- 
nents. When these rates approach uptime threatening 
levels, the Predictive Support system automatically notifies 
the appropriate person so that corrective action can be 
taken. The current implementation of Predictive Support 
covers all system disc: drives, magnetic tape drives, and 
system memory. 

Products being considered for predictive analysis must 
have a high degree of internal error detection and reporting 
capability for Predictive Support to be effective* Once a 
candidate has been setaded. product experts from HP's 
field Response Centers and manufacturing divisions model 
the degradation and failure modes of the product. The re- 
sults are reduced to a set of product independent rules to 
be incorporated into the Predictive Support product. The 
definition nf the rules is a dynamic process, so the Response 
Center experts must pay constant attention to the effective- 
ness of the rules established for each product. 



Predictive Support Software Operation 

After Predictive Support is distributed to the customer 
system, the analysis begins. Predictive Support executes 
in four basic phases. First, error data is collected. Second, 
the error data is reduced to a generic format. Third, trend 
detection is performed, and finally, if necessary, the appro- 
priate actions are taken to solve any problems. Overall 
processing is controlled by the predictive monitor process. 
Fig. 1 shows a graphic representation of the Predictive 
Support architecture, 

Predictive Support uses special utility programs to co]- 
lect error data. Each utility is launched as a child process 
and retrieves the error data for a specific class of products. 
Predictive Support currently uses three utility programs to 
collect data from system-maintained log files, internal disc 
drive logs, and processor memory logs, 

In the second phase of processing, the utilities translate 
the myriad of error data formats into a common message 
format, so the predictive monitor can use the same trend 
detection algorithm to process the error data to determine 
whether ;i failure is imminent an the specific system com- 
ponent. The messages contain information identifying the 
product and the class of error involved, in addition, error 
class specific information that is not needed for trend de- 
tection, but is necessary for further definition of the particu- 



Predictive Support takes action by nolitying either 
the Response Center or the operator 
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Is printed after every run 



Fig. 1 . Structure of the Predictive 

Support software 
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lax error, is appended to the message and passed on to the 
eventual troubles hooter. 

When the messages are received by the predictive 
monitor process for third-phase processing, the error data 
is passed through a trend detection algorithm, if the results 
indicate that an undesirable trend has been established, 
the appropriate action is triggered. For most products, the 
vast majority of soft errors are normal and transparent to 
system users. The trend detection algorithm is de* 
: bed in more detail later. 

Predictive Support Output 

Taking the specified action is the fourth and final phase. 
The predictive monitor notifies the appropriate person so 
that corrective action can be carried out by the customer 
or by HP support personnel. In cases where the customer 
is notified (e.g.* a specific medium is marginal, or a drive 
needs cleaning) , the predictive monitor sends a message 
to the console, informing the operator about the problem. 
In cases where HP needs further investigation of a potential 
problem. Predictive Support uses a communication link to 
transfer the information directly to the Response Center 
for investigation. At the end of every run of Predictive 
Support, regardless of the actions taken, a report is gener- 
ated listing all of the messages output during this and pre- 
vious runs of Predictive Support. This provides a hard -copy 
record of Predictive Support activity. 

The data communication link between the customer sys- 
tem and the Response Center uses the remote support 
modem installed on most of the HP 3000 systems in the 
U.S.A. At the end of the run, if messages have been gener- 
ated for HP support. Predictive Support requests the system 
operator's permission to use the modem. Once permission 
is granted. Predictive Support autodials the modem, if pos- 
sible and allowable, i filling into a secured port on the Re- 
sponse Center job management system. A single file is then 
transferred (with error checking] from the customer system 
to the Response Center. At each step of the transfer the 
customer has complete control over the connection. 

At this point Predictive Support has completed its execu- 
tion, in the cases where it has transferred action data to 
the HP Response Center, the HP engineers may investigate 
the problem further and recommend the appropriate course 
of action. Often this recommendation requires on-site ac- 
tion by the field customer engineer, but the downtime will 
be scheduled at a convenient time, This advance warning 
saves the customer the inconvenience of unscheduled 
downtime and allows HP to schedule customer engineering 
resources more productively. 

Trend Detection with Predictive Support 

The objective of the Predictive Support software is to 
monitor treods In computer system operation as they occur, 
notifying the appropriate person when a trend indicates 
that there may be a problem. It was obvious that some 
statistics must be gathered and analyzed, but the questions 
E>f how to keep these statistics and what to count had to 
be resolved. 

To analyze system trends, we need to identify enti! 
that can be monitored in the hope of detecting signs of 
impending failures. The most familiar source of failures 



on a computer system is the hardware, e.g., peripheral de- 
vices. The entities that are monitored by Predictive Support 
are referred to genericaily as devices. Nonperiphera! de- 
vices such as memory are referred to as pseud odevices. 
For each type of device that is to be monitored, some set 
of significant programmatically observable events must be 
identified so their occurrence can be tabulated and 
analyzed. The best example of an event to be monitored is 
an I/O error condition, such as a recovered read error on a 
disc drive. As implied by the designalion, these errors are 
detected and corrected by the drive and do not have a 
noticeable effect on performance or integrity. A single oc- 
currence of such an event would not indicate a problem, 
but when the frequently of recovered error occurrence 
reaches a certain level, a pending failure may be indicated. 

For trend detection, the frequency of occurrence of these 
significant events must be calculated for each device being 
monitored. Merely tabulating the occurrences of an event 
will not, however, yield the frequency of occurrence. We 
therefore need to track some other factor to weight the 
occurrence of the significant event. Since simple continu- 
ous tracking of event occurrence and weighting factor will 
lead to a dilution of the statistics, the accumulation of the 
weighting factor must be limited to create a sample across 
which the frequency of occurrence can be analyzed* The 
sample can count something related to the event, such as 
the number of disc accesses, or a time interval. To identify 
an undesirable trend, a threshold must be defined where 
occurrence value, relative to the defined sample size, 
is considered undesirable. 

In summary* to detect abnormal system trends, we need: 
1) entities to monitor, that is. devices and pseudodevices, 
2} definition of significant events that occur on the devices, 
and 3) rules defining both the samples across which event 
frequencies are monitored and the relative thresholds of 
event occurrence where problems may be indicated. 

Trend Detection Data Needs 

The Predictive Support software system uses two major 
data structures to manage the collection and analysis of 
the statistics kept for detecting system trends. One is the 
trend lo& a repository for collected event occurrence data. 
The other is the normalization matrix, which contains a 
set of rules governing the logging of event data in the trend 
log and the interpretation of the resulting trend data, The 
matrix makes the trend detection process table-driven. 
There is a unique set of events defined for each type of 
device monitored and one or more rules in the matrix for 
each event* A single element of the trend log is referred to 
as a bucket and contains the current event occurrence and 
sample values, along with a time stamp associated with 
the last time the bucket was updated. There is logically 
one bucket in I he trend log for each normalization rule 
associated with each actual device or pseudodevice config- 
ured on the system. The buckets are grouped into sets by 
device type. The set size is determined by the number of 
devices configured as the given type, leading to a one*to~ 
one correspondence between rules and bucket Bel 

The rule has two parts that control the collection and 
analysis of statistics in the buckets of its set; a bucket def- 
inition and an action specification. There are fields in the 



NOVEMBER 1986 HEWLEn- PACKARD JOURNAL 31 



)Copr. 1949-1998 Hewlett-Packard Co. 



Systems Design for Worldwide Delivery of Customer Support 



Customers using HP products count on fast, effective support 
to maximize the use of those products. HP meets this commitment 
by providing support services for hardware and software prod- 
ucts locally from field offices and remotely from centralized Re- 
sponse Centers throughout the world. This cooperative approach 
to support maximizes the use of HP resources and returns cus- 
tomers to operation as quickly as possible Fast delivery of so- 
lutions, in turn, yields the parallel results of increased resource 
availability and decreased cost of ownership 

The development of systems and tools to be used in the prob- 
lem-soJving process — including the software described in this 
issue on pages 30 to 56 — is the charter of the Knowledge Systems 
Laboratory, a part of HP's Customer Support Operation Another 
part of the operation, the Support Systems Laboratory, develops 
the management information systems used in the process. 

In evaluating possible projects for the Knowledge Systems 
Laboratory, we identify severaf areas of focus. To increase re- 
source availability, we concentrate on the following approaches 
to problem solving: predicting failures before they occur, re- 
sponding quickly when they occur, repairing products rapidly, 
and diagnosing problem causes accurately. To decrease cost 
of ownership, we concentrate on solving problems quickly, sol- 
ving problems once, managing both human and material re- 
sources efficiently, minimizing training costs, and sharing infor- 
mation on a worldwide basis. Since the problem-solving focus 
areas interrelate, many proposed proiects make improvements 
in more than one area. 

Although all software developed by our laboratory is intended 
for internal use only, it is subject to the same standards of quality 
as external HP software. All projects selected for development 
must be designed to user specifications for functionality, usabil- 
ity, performance, reliability, and supportability. Expert systems 
toots must meet the additional criterion of an easy-to-update 
knowledge base for long-term usability. Updatability thus becomes 
another criterion guiding the design of our projects Some appli- 



cations may calf for ultimate delivery as self-contained or cfosed 
systems for which we maintajn the contents while others are best 
delivered as operating shells for which we provide only the struc- 
ture and the user maintains the contents. 

One project that promised positive results in both resource 
availability and support costs was thecapability to predict failures 
on key elements of an HP 3000 system Since electromechanical 
devices account for a significant amount of system downtime 
(and a corresponding amount of on-site calls), these products 
were identified as likely candidates for proactive support tech- 
niques. Error detectron and reporting capability proved to be 
equally important features for products suited to this type of 
support, and disc drives, tape drives, and system memory were 
uftimately selected as subjects for our Predictive Support soft- 
ware project. The software system revolves around a single al- 
gorithm delivered as a closed system to customers, Predictive 
Support software has proven successful in increasing customer 
system availability, scheduling on-site visits, and centralizing 
customer system information. 

To improve results rn the areas of rapid repair and accurate 
diagnosis, another project team focused on the area of expert 
systems for use by field and Response Center engineers- Since 
success with these systems depends on a dearly defined do- 
main, the specialties of HP 3000 dump analysis, asynchronous 
terminal -to- HP 3000 d at acorn links, peripheral troubleshooting, 
and system configuration were selected as the domains of four 
separate projects. As some of the first practical applications of 
knowledge engineering techniques, the projects have made con- 
tributions to the training of engineers with limited experience as 
well as to the skill level of engineers whose expertise lies outside 
of these domains. 

Bfenda Mariani 

Product Support Manager 

Knowledge Systems Laboratory 



bucket definition that define the bucket's sample value in 
terms of the normalization method and maximum size. The 
method specifies whether the sample is a time interval or 
an accumulation of some other event (e.g., disc accesses). 
The sample value in the bucket cannot exceed Un« 
maximum size defined in the rule. If the logging of event 
data causes this condition to occur, the occurrences relative 
to the obsolete portion of the sample must be discarded 
before the new event occurrences can be added in. The 
formula used to reduce the occurrence value when this 
overflow condition arises is described in detail in the box on 
the next page. The bucket definition also includes the event 
occurrence threshold. If after event data is logged the buck- 
et's event occurrence value is at or above the threshold, 
the action defined by the action specification part of the 
rule will be taken, 

The actions of the Predictive Support system will fre- 
quently involve communication with human users, namely 
the customers and/or HP T s support engineers. After the mes- 
sage is sent to the appropriate person, there are three op- 
tions: the bucket values (time stamp omitted] can be left 
alone, reset to zero, or reset with event processing sup- 



pressed (only if the bucket has a time normalization 
method]. For example, some disc devices have their own 
microprocessor controller for which the software Is held 
in ROM. The utility for examining their logs only supports 
the current (and future) ROM revision, and it checks the 
revision level before reading the drive logs, reporting an 
event if the customer's drive is using an unsupported ver- 
sion. This condition is then reported to the Response 
Center, and a service call is initiated to upgrade the ROM. 
Service calls of this nature are usually scheduled weeks in 
advance, so there is no need to continue reporting it each 
time Predictive Support is run. For this reason, event pro- 
cessing is suppressed for that bucket for the length of the 
sample size (3 weeks). 

Predictive Event Processing 

During event processing t the predictive monitor 
launches each utility, one at a time, and waits for it to send 
a message, it wakes up when a message is received, and if 
it is an event record, the following algorithm is performed 
to log the trend data: 
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■ Use the device ident^cation information to find the first rule rn the 
normalization matrix 

■ Read the trend log values associated with the rule, and use the time 
stamp, along with the event record s time stamp* to compute Hie 
interval to be used as the event record s sample value for rules with 
a time normalization method. 

■ FOR EACH rute associated wrth the event number and device type 
specified in the event record, log the event record's trend data into 
the trend log as follows: 

IF the rule has a suppress action, 
AND the bucket is suppressed, 

AND the time since last update is less than the rules maximum 
sample. 
THEN skip the following steps, continuing with the next 
rule 
Use the algorithm described in the box on the right to log the event 

data into the trend log 
IF the bucket's new event occurrence value is greater than or 
equal to the threshold defined in the rule, 
THEN take the specified action by outputting the message 
to the appropriate destination and optionally resetting The 
bucket values and suppressing further event processing, 
a Continue with the next rule and. when all rules have been applied, 
wait for the next message. 

Conclusion 

Using the files and algorithms described in this article, 
the Predictive Support software system collects meaningful 
system data. This data is gathered and analyzed in terms 
of statistics that provide insight into the trends of computer 
system operation. This information can then be used to 
detect abnormal system trends in an effort to identify and 
repair weak or failing products before their impact on sys- 
tem users becomes significant. Clearly, not all failure 
modes of products manifest themselves as degradations 
before eventual failure. But for I hose frequent cases where 
this does occur, Predictive Support is quite effective. 
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Logging Event Data in the Trend Log 

Logging event data involves adding the event record values 
to me current trend log values. The resulting event occurrence 
s then c om pared wftj : nolo to etel 

\ b action 

Three pairs of values are used fc r ection X values 

are associated with event occurrences and Y values are as- 
sociated with the sample values. In this description eac^ 
pairs will have a subscript desrgnation The count and wetghr 
values from the event record use the subscript e For example 
H the normalization method Is time Y e = trme s^nce last update 
The trend log values for current event occurrence and sample 
values stored in a bucket use the subscript b. The threshold and 
maximum sample values defined in the normalization rule use 
the subscript r 

When an event records values are logged, one of three mutu- 
ally exclusive conditions exists. The equations used to set the 
new bucket values in each case are as follows; 

Case 1 : Y e 3*Y r (event sample larger than maximum) 
X b ±- (Xa/YJY, 
Y c «-Y, 

Case 2: (Y^ + YJ ^Y r (bucket sample will not overflow) 

x b *— x b + x 

V t -Y b + Y e 

Case 3: (Y e + Y b )>Y f (bucket sample will overflow) 
X b — X b - (Xb/YbM Y b + Y e - YJ + X e 
V b -Y r 

Under Case 1 the event record values cover the entire sample, 
so the old statistic is discarded and the ratio of the event record 
values is multiplied by the maximum sample, resulting in the new 
X D value 

Under Case 3 an attempt is made to discard obsolele occur- 
rences, so the statistic is evaluated only tor the sample specified 
in the rule The X b value must be normalized before X^ is added 
The amount of overflow is multiplied by the current ratio of the 
bucket (i.e. frequency of occurrence), and the result is sub- 
tracted from X b before X,, is added 



NOVEMBER T98€i HEWLETT-PACKARD JOURNAL 33 



)Copr. 1949-1998 Hewlett-Packard Co. 



AIDA: An Expert Assistant for 
Dump Readers 

This expert-system-hased program increases human 
readers 1 productivity and success rate in HP 3000 memory 
dump analyses, 

by Lynn R, Slater, Jr., Keith A. Harrison, and Craig NL Myles 



JL IDA (Automated Interactive Dump Assistant) is an 
M^L application that is designed to assist the human 
dump reader in the analysis of memory dumps taken 
from the HP 3000 family of computers* 

Dumps (see box, next page) are read for two main reasons. 
The support engineer may simply wanL to know what the 
problem was, to provide a workaround in as shorl a period 
of time as possible. Alternatively, the engineer may wish 
to determine the precise sequence of events that caused 
the failure, to effect a cure. 

Dump analysis requires an unusual set of skills. The 
dump reader must be very familiar with the details of the 
operating system and its subsystems to relate memory loca- 
tions to memory contents at the time of the dump, The 
dump reader needs to be able to scan dynamic links, iden- 
tify the role of each memory portion, determine what was 
happening at the time of the dump, determine what is 
unusual, and from this determine the general nature or 
exact cause of the problem. 

In analyzing a dump, a dump reader must know many 
details- Many operating system data structures are variable 
in format* Normally legal values may be inappropriate de- 
pending upon other parts of the dump, and normally in- 
valid values may be appropriate for short times as the 
operating system updates other values, Dumps may be the 
expected result of an improper configuration* may be a 
known problem with complex symptoms, or may be 
unique. Also, the direct cause of a problem may have been 
the result of memory improperly modified by some other 
cause. The initial cause of the problem may not be in mem- 
ory or may have been overwritten by the effects of the 
problem. 

Dump readers are hindered by the lack of any single 
source for their detailed knowledge. Within Hewlett-Pack- 
ard, the dump reader's task is further complicated by our 
commitment to supporting multiple versions of the operat- 
ing system. Our dump readers must master differing sets 
of detailed knowledge for each version. An expert system 
provides a central repository for this information and also 
guarantees that it will not be forgotten or lost as experts 
move to newer releases or change jobs. 

We cannot reduce the inherent complexity of the dump, 
but we can try to encapsulate the detailed knowledge, 
which can then be thought of as rules, 1 in a tool. Reducing 
the amount of detailed knowledge needed to read a dump 
both speeds dump analysis and increases the pool of dump 



readers. We can further reduce the time needed to read a 
dump by allowing user management of the information 
presented, 

All this detailed knowledge must then be applied to every 
significant portion of memory. Any portion may be cor- 
rupted* and any corruption is a clue to the nature of the 
problem. Humans cannot apply their expertise to every 
portion of the dump and still complete the analysis in a 
reasonable amount of time. Instead, they must start from 
a few leads and try to discover the portions of memory 
worth detailed analysis. On the other hand, an automatic 
tool can scan each portion of memory as it is first encoun- 
tered and report unusual conditions to the dump reader. 
Human dump readers, relying on personal experience and 
knowledge unobtainable (or uncodable) to the tool, can 
then decide the significance of the corruptions. 



Corruption Detected in ; CONFIGURATION 

Because : Ldev 11" s CPVA(O) word £142036) O 

#*** njvyv Abrjrt - CHRB3 B - 002036 Mem Addr 4*150000 



Corruption Detected in : CONPIGUPATIOW 

Because ; Ldev 100 is a real device with a DIT pointer of 



Corrupt ion Detected In : C0NFIQ rRATTON 

Because : Warning: Ldev 163 default output is to ldev 162 



Corruption Detected in : CONFIGURATION 

Because : Warning: Ldev 18.1 default output is to ldev 1S2 



Corruption Detected in * CCWFIGURATION 

Because : Gic 4 on 1MB only contains INP/Translator Cards* 

Please check that there is no HPlB hood cable on the GIC card, 

an this can cause memory corruption - 



Corruption Detected in : CDNFTGURATICiN 

Because : 1MB I has 3 high speed £ics cuniiqured. 2 is the maximum 



corruption Detected in : PS . - 

Because : Number of WQ table entries fO) <= 

Assuming 1130 entries 



Corruption Detected in : MEMDRV-4- 150000 
Because : Found a region of length = 



Fig, 1, Typscal corruptions detected by AID A. 
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What Is a Memory Dump? 



A memory dump is a copy of the contents of a computer s 
emory along wr 'agis- 

ter contents, I/O channel progrs r 

ory dump ts a snapshot of what the computer was do»ng 
at the time of the dump. The dump is often made by spec ~ 
pose programs that either always reside in memory or are sup- 
ported by firmware These programs are normally designed to 
make no use ot the operating system and whte the contents of 
memory directly to a tape 

Many tools extst to find a problem if a regular application 
program faifs. These tools may run as independent processes 
that examine or modify the data or code of the application pro- 
gram However, if the operating system fails, diagnostic pro- 
grams will not be able to run until the system has been restarted 
and the evidence of the problem destroyed. Therefore, a memory 
dump is taken to capture the state of the machine at the moment 
of the failure. Memory dumps preserve the evidence of the prob- 
lem for later diagnosis, often this is the only evidence. Without 
dumps, experts can only make educated guesses about what 
might nave happened. Even if the problem is theoretically solv- 
able without dump analysis., the dump is often a good first clue 
to the cause. 

Operating systems may fail for many reasons. The CPU may 
encounter an illegal instruction in privileged code* externa? inter- 
rupts may occur faster than they are resolved, the operating 
system might believe that it has no work to do or be waiting for 
something that will never happen, or internal routines might detect 
abnormalities of such seriousness that they call system failure 
routines rather than risk destroying still valid data with continued 
processing. Most often, these are the final steps *n a problem 
that has already happened 

In addition to AIDA, there are two tools for the dump reader 

■ DPAN is a program that formats certain tables and memory 
from the dump tape and produces a line printer listing Users 
are faced with two options. They can print the whole dump. 
an unattractive option particularly for the larger machines, or 
they can second-guess the problem, print certain tables, and 
then, if more information is needed, print additional information 
as required, 

■ IDAT is an interactive version of DPAN. It offers a different set 
of formatted tabfes than DPAN. but presents information in- 
teractively upon demand. IDAT is also capable of accessing 
a dump tape en a remote machine To use IDAT successfully, 
the user needs to be very skilled in dump analysis. 



Automatic corruption detection alone will rarely solve 
a dump. Most often it is just the first step. The corruptions 
indicate why the system failed, but you must know why 
the corruptions occurred to solve the problem. This is the 
real task of dump reading. Experienced dump readers often 
scan source code or uncorrupted memory for clues. Even 
if some of this process could be automated, there will still 
be portions of dump analysis that will depend upon a per- 
son's ability to solve problems never before encountered. 
Until a tool can guarantee an automatic solution to any 
memory dump, human dump readers will he Involved. 
Therefore, to allow the user to pursue the problems llu* 
tool did not solve, a good dump reading tool should act as 
an expert formatter as well as an automatic analyzer. 



AIDA Overview 

AIDA provides the following capabilities: 

■ The automatic detection of simple data structure or sub- 
system corruption 

■ The automatic analysis of some types of class problems 
« Ergonomic presentation of data from the dump. 

AIDA is an expert-system -based application intended for 
use by support engineers in the HP Response Centers. HP 
Customer Escalation Centers, or HP product divisions to 
solve customer problems in less lime and with more con- 
fidence that the problem will not recur. It can also be used 
by product divisions to debug new T operating system com- 
ponents. 

AID A concentrates on trying to identify the problem au- 
tomatically while providing dump readers with the tools 
to pursue their own lines of inquiry. 

AIDA automatically detects standard problems such as 
resource deadlocks, and eliminates the need for the human 
dump reader to process these well-defined problems, AIDA 
also detects most forms of corruption in MPE tables or 
subsystems. [MPE is the Multiprogramming Executive, the 
HP 3000 operating system.) This automation frees the dump 
reader to concentrate on the more challenging aspects of 
the dump. However, AIDA cannot automatically examine 
the entire dump because of the dump's large size and the 
need for dump readers to have fast Interactive results- Fig. 
1 shows typical corruptions detected by AIDA. 

While AIDA cannot guarantee an automatic solution to 
any problem presented, it is a superior formatter which 
makes it easy for dump readers to pursue independent 
lines of inquiry without changing their environment or 
their thought flow, 

Assuming that the problem is not automatically detected 
by AIDA. or the user wishes to determine why the problem 
occurred, the reader can display any table or data structure 
resident in memory at the time the dump was taken. This 
data is displayed in as informative a way as possible. As 
the user directs AIDA to bring in more information from 
the dump, this new information is automatically scanned 
for errors or inconsistencies; these discoveries are added 
to a special corruptions data base and reported, The corrup- 
tions can often guide dump readers to a problem or 
symptom that otherwise may have gone unnoticed. 

In practice, human dump readers do not normally com- 
pletely analyze every dump they process. At times they do 
not invest the effort needed to follow the com pi h 
chains that AIDA does and either give up or need a second 
dump for comparison. A major contribution of AIDA is its 
superior Formatting, which allows it to follow these chains 
automatically. Hence, with AIDA the solution rate goes up. 
It does not just speed up the process, encapsulate knowl- 



Dump 
Access 



User 



HP 9000 
Made! 236 


Method 


* " n 








Dump 
Image 



Fig. 2. AIDA runs on an HP 9000 Series 200 Workstation, A 
process on the HP 3000 called the dump access method 
sees the workstation as a terminal 
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edge, and let nonexperts read dumps, it makes problems 
solvable that were not solvable before. 

Environment 

AIDA runs on an HP 9000 Series 200 Workstation run- 
ning underthe Pascal 2.1 operating system with a minimum 
disc size of 45 megabytes. This configuration was chosen 
because it was compatible with the prototype version of 
Hewlett-Packard's AI development environment. 1 

To obtain information from the HP 3000 dump tape, the 
strategy depicted in Fig. 2 was adopted. A process on the 
HI 1 3000 (the dump access method) sees the workstation 
as a terminal. Whenever the workstation requires informa- 
tion from the dump it sends a message to the dump access 
method, which parses I he request, extracts the relevant 
data from the disc image of the dump tape, compresses the 
data, and writes it nut to the workstation. All data transfers 
are checksummed and numbered for safety. 

The HP 9000 portion of AIDA runs in a Lisp-based envi- 
ronment which provides compiled and interpreted Porta- 
ble Standard Lisp. 2 objects/ 1 and an EMACS-like editor 
called NMODE,' 4 The NMODE editor supplied the code 
development environment as well as the foundation of the 
user interface. AIDA can be thought of as an NMODE exten- 
sion that generates and manipulates data in ways particu- 
larly suitable for dump reading. Most of the information 
management described later in this article is taken from 
NMODE. By building on top of NMODE r AIDA is able to 
support aoy number of independent buffers, any number 
of windows on a large number of devices , infinite scrolling 
horizontally as well as vertically, context sensitive oo-line 
help, list browsing with filters, and other features with 
very little development effort. These features are all bene- 
fits of the standard user interface utilities supplied by 
NMODE. Without these features, AIDA 1 s contribution 
would be significantly lessened. 

Knowledge Acquisition 

When human dump readers start to read a dump, they 
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Fig. 3. Formatted header section of the MPE logging table 



Fig. 4. A portion of the formatted data section of the loader 
segment table. 

scan the dump looking at the registers, process control 
block table, and other areas, trying to get a "feel" for the 
state of the machine. If they spot something that doesn't 
feel right, I hey direct their energies to trying to explain 
their observation before resuming with their initial search. 
The AIDA project determined and subsequently modeled 
the knowledge of the operating system, machine architec- 
ture, and typical search order by taking protocols of human 
dump readers analyzing selected dumps. A mixture of stan- 
dard test cases and random dumps supplied by customers 
was used. It was found beneficial to choose the experts 
carefully. The most useful were engineers who were able 
to communicate easily. Also, the less experienced en- 
gineers were preferable because they still performed their 
analyses deliberately. As ihey became more competent they 
tended to work more instinctively and it was harder to 
determine why they worked the way they did. 

Analysis Phase 1 

The first phase of analysis is designed to emulate the 
human dump reader's initial scan; it gathers the minimum 
amount of initial facts needed to begin processing. The 
connection to the dump access method is created and the 
dump file opened. AIDA checks the file type and a few 
memory locations that allow AIDA to decide whether the 
file is a dump file taken from MPE V/E or later. AIDA then 
extracts and locally caches a predefined set of data via the 
dump access method. As this data is brought in, it is 
scanned by the automatic corruption detectors. It is possi- 
ble that nothing unusual will be found during this first 
pass. At the end of this phase, the user knows miscellane- 
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ous data such as CPU model, operating system version, 
time of failure, system failure Dumber, current instruction 
(automatically decompiled), console device, and lan- 
tern welcome messages. Also, if the dump is from a system 
failure, explanations of the failure, probable causes and 
resolutions, and the calling modules or procedures land the 
line of code within them] are extracted from separate master 
lists maintained by the MPE development laboratory, the 
Response Centers, and each dump reading site. This pro- 
cess takes [ess than 90 seconds with a 1200-baud phone 
link. The dump reader can direct AIDA to verify the HP 
3uu(Ts configuration or scan and verify key tables such as 
the CST, DST t and PCB during this initial phase. This ad- 
ditional processing requires extra time. 

Analysis Phase 2 

Once the initial phase of data collection and analysis 
has been completed, all discovered dump inconsistencies 
are reported. The dump reader can now scan these mes- 
sages and the miscellaneous facts derived from the dump 
to decide what to do next. The dump reader can have AIDA 
execute a user-defined suite nt commands that may uncover 
further corruption while the reader works at some other 
task, 

At this point, the dump reader is using an interactive 
formatting tool. Each tinie new data is requested, AIDA 
automatically scans it for corruptions. This fetch and scan 
on demand strategy is necessary hecause the size of a dump 
prevents automatic scanning of all of the dump in a time 
acceptable to an Interactive user. 

Internal Structure 

AIDA kiips the formatting and corruption detection task 
separate from the control, presentation, and information 
management (user interface) task. This helps keep the 
knowledge of MPE separate from the user interface design. 
The tasks are interdependent and both tasks are always 
active. 

Formatting and Corruption Detection 

There are many tables rind subsystems used in MPE, but 
generally only relatively few are significant to any particu- 
lar dump. To reduce our memory requirements, AIDA has 
adopted a load on demand strategy: the AIDA code required 
to format or analyze a table or subsystem is brought into 
memory only as that table or subsystem is requested by 
the user. Besides reducing our memory requirements, this 
strategy has helped us keep our code very modular and 
-nil i ured SO thai additional tables or subsystems are easily 
added. AIDA does not have ihe capability to unload code 
once the formatting or corruption detection is complete 
While this could theoretically cause AIDA to run out of 
memory as it reads a dump, this has only happened under 



maximum test conditions on our smallest workstations. 

To add a new table or subsystem formatting capability 
to AIDA, entries are added to internal AIDA tables, These 
tables teO AIDA what command or commands use the code 
in the new module, the corruption checks performed by 
the module, and how to find the module. Each module 
also lists what other modules it needs loaded. The user 
interface portion of AIDA keeps track of the loaded modu les 
and ensures that all the appropriate code is available before 
the execution of any command. 

All modules call upon a set of fundamental utilities that 
access memory, formal fields, validate value ranges, ma- 
nipulate internal data structures or objects, supply data to 
the user interface, extract or set particular bits, or perform 
other low-level internal manipulations. All routines are 
written with the expectation that their arguments may have 
been derived from a portion of the dump with undetected 
corruption and thus may be nonsensible. The utilities re- 
turn special innocuous values that will let the corruption 
become visible to the user without aborting any part of 
A11JA. This lets AIDA handle dumps with corruptions not 
anticipated by the expert who wrote the module. In the 
few cases where a routine does terminate abnormally, the 
structure of the Lisp environment and the NMODE editor 
underlying AIDA allows a graceful recovery where a com- 
mand, column, or row is inaccessible, but the rest of AIDA 
still runs. Users can always display the unformatted raw 
memory to discover the nature of the corruption. 

Tallin and memon reference routines BXB 'itiii'Nt; rise 
utilities available to Ihe formatting modules. These are the 
Only routines that actually access the dump data. These 
routines keep track of which portions of the dump have 
been accessed and cause the appropriate consistency and 
corruption checks to execute whenever new portions of 
the dump are accessed. They are. in effect t forward chaining 
rules whose tiring is based on the availability of data, Thus 
every portion of the dump is checked and corruptions re- 
ported as SOOB as ihe data becomes available lo AIDA. 
Corruption checks that depend upon the data from more 
than a single portion of rneinorv may bring in all desired 
portions of memory whenever any portion is brought in, 
may wait until all portions are brought in r or iimy execute 
when most of the memory is broughl in. AIDA MPE experts 
used ail these strategies in coding different corruption 
' hecks. 

There are basically two kinds of formatting; tables and 
subsystems. 

Table Formatting. The formal ting of a table involves dis- 
playing a single logical MPE structure as it appears to MPE. 
MPE is a table-driven operating system and the examina- 
tion of tables is vital to dump readers. A table consists of 
a header section and a data section. Entries in the header 
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section apply to the table as a whole and may contain 
information such as number of entries, entry size T first free 
entry, etc, The data section is divided into any number of 
en Inns, each of which appears as a row and relates to some 
Internal MPE logical entity. Each entry is further sub- 
divided into any number of fields, which appear as the 
columns ol the table. Some tables do not have both sections. 

Our observations of expert dump readers showed that 
they are concerned either with some aspect of the whole 
table or with just some particular entries. They rarely 
examine the header and the data portions of a table a I tin- 
same time. AJDA displays these sections independently. 
Fig. 3 shows a formatted tabic header section and Fig. 4 
shows a formatted data section. 

All table formatting is driven by a standard formatter. 
For the header section, the table specific code just specifies 
the labels of the entries and later the formatted values of 
the entries for each particular instance, For the data sect ion, 
the table specific code specifies the labels of the fields 
(which become the columns of the I able) and later the 
formatted values of the fields for each row. The transforma- 
tion of this internal data into a display is handled by the 
control, presentation, and information management por- 
tions of AID A. A key point of both table sections is that 
the table specific formatting routines are concerned only 
with the MPE specific logical format of the table and not 
with any of the implementation details such as screen 
width or whether the table is to be printed on paper, 

AIDA often displays data that does not actually exist in 
a table but is logically related. If a table contains informa- 
tion about another table, this information may be displayed 
in the header section of the other table even though MPE 
does not physically store it there. Similarly, rows may have 
columns added where the data is taken from other tables. 
Where there is a great deal of related data lor a row, and 
where this data would overwhelm the user if presented for 
all rows, AIDA will display a simple form of the rows but 
will let the user expand any row [Fig, 5), When a row is 
expanded, formatter routines are called with the row and 
table as arguments and can return any amount of data taken 
from any number of dump locations. The nature of expan- 
sions varies greatly with each table. For example, code 
segment table entries expand into decompilations of the 
actual code segment, and device information table entries 
expand into the channel programs. MPE tables were de- 



signed for customer performance and not for dump reading; 
these logical enhancements to the actual table structures 
make MPE more readable. 

Subsystem Formatting, The formatting of a subsystem 
involves retrieval of data from many portions of the dump, 
AIDA includes in subsystem formatting the presentalion 
of data that, while not a table or subsystem to MPE, is seen 
as logically related by a dump reader (Fig. &), The display 
of DS users is an example of formatting of a true MPE 
subsystem (Fig, 7) The display of all of the executing pro- 
grams along with oser name, job or session number, and 
capabilities is an example of useful information that is not 
properly a subsystem to MPE. Unlike the tables, there is 
no common structure for subsystem formatting. 

Control, Presentation, and Information Management 

The user interface for AIDA presents some special chal- 
lenges. The human dump reader, analyzing a forma I 

printout of the dump, is generally juggling many pieces of 
Information, using pens, rulers, paper clips, etc, to rmirk 
places in the dump. Ideally, a user interface should provide 
the same capability, The user interface should assist the 
dump reader in the location and interpretation of the data. 
The user should not have to refer to an external architec- 
I lira I design document such as the MPE Tables Manual. 
Multiple Butlers and Windows, Any number of tables or 
subsystems may need to be visible simultaneously. Ini- 
tially, AIDA offered a table selection menu and graphics 
tablet to select an MPE table and an icon representing the 
desired action. However, the combination of pen and 
keyboard proved cumbersome and the keyboard alone be- 
came the input device. 

AIDA adopted from NMODE the ability to display any 
nomber of buffers with independent data in any number 
of windows. Each buffer can be thought of as a separate 
dialog between AIDA and the user. This independence 
allows the user to pursue many trains of thought. These 
buffers are very similar to the ones used in the EM ACS 
editor. The underlying NMODE editor allows qoick and 
natural motion from buffer to buffer. Except for textual 
commands, users can navigate through AIDA and the dump 
with single-keystroke commands. For performance reasons, 
AIDA limits users to two windows on devices that are not 
bit mapped. 

One of the buffers, the MPE tables browser, contains a 
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list of all MPE tables available in the dump (Fig. 8). Users 
move the cursor to pick a table and use single-keystroke 
commands to read it. Another buffer accepts subsystem 
formatting commands and creates new buffers to display 
each subsystem. Another buffer contains optional values 
that the user can edit to control AID As operation, 
information Red uu ion A major problem in any dump 
analysis is information overload. Most of the dump 
no significance to the problem. Normally, a dump re-ad er 
analyzes a dump in a constrained and Oeeting manner. It 
is normal to consider only one or two words of a table 
before going to a different table. The dump reader's task is 
to find and isolate the significant portions of each table or 
subsystem. Tools that just list formatted memory (either 
on paper or to a screen) are of no assistance here: the sig- 
nificant data is mixed with the insignificant, Rather than 
just listing to a screen, the user interface of AIDA ret ■«■ 
tables from the formatter in a data structure thai preserves 
rows and columns. Initially, the whole table is displayed, 
but facilities exist for the user to control which columns 
are displayed and to filter down to those entries \ha\ have 
a field equal to a given value (Fig. 5). This filtered list can 
be subsequently filtered. For example, when given a format- 
ted process control block table 314 characters wide with 
500 entries, a user might choose to look only at the 
wakemask (a column of the PCB table) of those entries that 
represent nonuser processes that are critical, hold an SIR, 
and are impeded. Instead of searching a display of 314 
characters by hundreds of lines* the dump reader examines 
only two lines of 71 characters. This column and row spe- 
cific filtering can he applied to any AIDA table format. The 
table -driven nature of MPE means that the user interface 
does not need any understanding of the semantics of the 
fields for the filtering to be significant in most cases. Also, 
for most tables it is possible to select an entry for a more 
detailed explanation. 

The varied nature t>l subsystem formatting prevents 
AIDA from providing a general information management 
scheme such as the column selections and fillers used on 
lable.s. Instead the formatting is initiated by textual com- 
mands from the user (Fig. 9). These commands accept ar- 
guments that control the nature and depth of detail of the 
resulting display. Generally, a fresh buffer is created for 
each command t but AIDA avoids huffer overpopulation by 
combining the results of short formatting commands in!r» 
a single buffer known simply as OUTPUT, The subsystem 
formatting interface is designed to simulate an interactive 
IDAT session. This design eases the dump readers" transi- 
tion into using AIDA. 

On-Line Help. The usability of AIDA is enhanced bj a 
context sensitive on-line help system. The user can ask for 
help from anywhere in AIDA. The underlying NMODE 
editor knows what buffer the user was in and from I ha I 



determines the purpose of the buffer and the appropriate 
help chapter. A buffer is created containing this chapter 
and substituted for the buffer the user was in. If the chapter 
does not coniain the necessary informal ion. the user can 
go to the on-line help table of contents and choose another. 
Users are returned to the original buffer when they . 
the help system. NfPE information useful to dump readers 
is also included in the on-line help system. This includes 
known entries into system failure routines, probable causes 
of system failures, and the meaning of each field in any 
table that AIDA formats. 

Preservation of AIDA Sessions, Dump readers do not al- 
ways complete their analysis of a dump in a single session, 
AIDA provides the ability to save a partially analyzed dump 
in the current state. All processing done, all dump data 
read, and all corruptions found are saved to disc. This 
feature allows a dump reader to respond to interruptions 
or wait fur a response from another location and later con- 
tinue from the same point 

Rules in AfDA 

Readers familiar with expert systems might have noticed 
that the changing detailed knowledge required by experts 
to read dumps seems to offer an ideal application for rules 
in a traditional expert system. AIDA started from this obser- 
vation and initially used techniques similar to those de- 
scribed in the articles on Schooner and TFT (pages 42 and 
48). The results were acceptable except for performance. 

Upon examination it turned out that formatting and most 
error checks follow a simple path with few control branches* 
Rules and traditional inference engines search a large rule 
domain for applicable rules to fire. This hunt and peck" 
approach is very flexible and suitable for systems where 
there may be any number of rules in arbitrary order that 
become ready to fire in an unpredictable order. However, 
< insider the case where rule B will always fire after rule 
A and at no other time. Unless I he rule selection strategy 
is perfect, execution is faster If rule A and B are combined. 
Consider the case where rule C or rule D will fire after rule 
H and tit no other times. These rules could be stated with 
a reformulation of rules A and B as pari of their predicate 
along with whatever condition selects rule C over D or vice 
versa. Traditional inference engines would search the en- 
tire rule domain- evaluating I he predicate of all rules en 
countered until rule C nr U is discovered. Again, unless 
the rule search strategy is perfect, it would be faster to 
combine rules A t B, C,and I). (Also, many inference engines 
would have reevaluated the portions of rules C and D that 
had already been evaluated when deciding to fire rules A 
and S,] As these rules are combined, procedures are 
i reated; ihe longer the chain, the longer the procedure. The 
merger of rules A, B, C. D can be expressed as follows: 
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IF < Predicate of rule A> 
THEN BEGIN 

< Action of rule A> 

IF ■ Predicate of rule B except those parts stated in rule A> 
THEN BEGIN 

< Action of rule B> 

IF < Condition that distinguishes rule C from rule D> 
THEN <Action for rule C>; 
ELSE < Achon for rule D>; 
ENDIF; 
END; 
ENDIF; 
END; 
ENDIF: 

AID A is fortunate that in MPE each table or subsystem 



requires a largely independent body of knowledge, Dump 
analysis is a large problem domain, but the domain is di- 
vided into well-established, distinct subdomains— process 
hangs, system failures, flying bytes, etc. The knowledge 
base is compartmentalized and in each compartment the 
knowledge is easily expressible as procedures, This sub- 
division makes the problem manageable, 

Initially, AIDA held detailed knowledge exclusively in 
rules, but these were gradually condensed into procedures 
until AIDA had very few rules left. These remaining rules 
were incorporated into procedures in the appropriate mod- 
ules and all new tables or subsystems were created with 
procedural corruption detection schemes, The strategy of 
scanning for corruption on the first access to any memory 
portion serves to make these procedures essentially for- 
ward chaining rules. As we did this, performance improved 
dramatically. For example, a resource deadlock check that 
had taken over five minutes was reduced to less than two 
seconds. 

Lessons Learned 

Rules are not a necessary component of an expert system. 

The history of rules in AIDA is probably the most signif- 
icant lesson for others thinking of developing expert sys- 
tems. Rules are not a requirement for a expert system. AIDA 
is no less expert with procedures than it was with rules 
and seems to be just as easy to modify. Basically, rules do 
not provide acceptable performance for low-level detailed 
checks performed many times. Applications with well- 
compartmentalized knowledge may do best lo avoid rules. 

We do expect rules to reappear in the next phase of 
AIDA. Now that we have this foundation of accessible 
dump knowledge, we can add rule systems to perform tasks 
such as deciding what portions of memory are likely to be 
significant and filtering for them, or deciding what AIDA 
command an expert user would use next given what AIDA 
has already discovered. For this phase, we will probably 
use a rules system based on the same principles as OPS5^ * 

Supply more Hum expertise. 

The other lesson learned from AIDA is that an expert 
system that does not solve every problem presented will 
not gain acceptance unless it also acts as a tool for the user 
to solve the remaining problems. If AIDA had not been an 
expert formatter, it would not have been used because it 
could automatically solve relatively few special cases at 
first. By being part of the normal w r ork flow, A EDA can 
make incremental contributions in each step by relieving 
the dump reader of having to check all the fields and tables 
manually and by sometimes finding a problem on its own. 
\ii AIDA were not an expert formatter, each dump not au- 
tomatically solvHfl would count against it. Instead, each 
dump AIDA automatically solves is justification for further 
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A Troubleshooting Aid for Asynchronous 
Data Communications Links 

Schooner is an expert system for fault diagnosis and 
personnel training on point-to-point dataoom links. 

by Brian T. Button, R. Michael Young, and Diane M< Ahart 



THE DATA COMMUNICATIONS market is among the 
fastest growing markets of the computer field, and 
data communications support has a rapidly increas- 
ing need of expertise. Solutions in this area are perceived 
by computer vendors and customers to be critical for a 
successful total solution. The trend toward personal com- 
puters and office automation has given increasing impor- 
tance to easy installation and support of data communica- 
tions links. 

To address a portion of this problem, Hewlett-Packard 
Initiated the Schooner project. Schooner's principal objec- 
tive is to maximize leverage of HP's expertise in asynchron- 
ous data communications and reduce the time it takes to 
solve dalacom problems for HP's customers- 1 1 attempts to 
do this by solving the easier problems, thereby reducing 
the workload on HP's more experienced engineers and free- 
ing them to tacJde other problems. Schooner also serves as 
a training tool for less experienced personnel. 

Overview 

Schooner combines an inference engine and a knowledge 
base 1 to provide expert-level assistance wit h asynchronous, 
point-to-point data communications problems for fault 
diagnosis and personnel training. It verbally guides Re- 
sponse Center engineers, Field support personnel, or other 
users through the solution of problems in this area. 

The present knowledge base is oriented towards diag- 
nosis of problems in RS-232-C links connecting a terminal 
to an HP 3000 Computer either directly or with modems. 
When initiating a troubleshooting session with Schooner, 
the user designates the configuration by specifying any 
combination of terminal type, port type, operating system, 
connection type [cable or modem), and if applicable, the 
types of modems at the terminal and computer sides of the 
conned ion. 

After acquiring this information, Schooner goes to the 
main investigation phase of the session, asking the user to 
rela\ observations rmd perform manipulations. SifiCS the 
configuration description determines the rule set used in 
this phase, the session proceeds differently for mod em -con- 
nected data links than for direct (cable-connected) links. 
Schooner understands problems and characteristics spe- 
cific to the makes and models of devices in the configura- 
tion. 

Schooner Tests 

In Schooner, tests are the basic units of inference corre- 
sponding to rules in classic rule-based systems. Tests de- 



scribe a state in which observations may be made, observa- 
tions to make, and a translation from this information to 
beliefs about potential faults and the state of the system 
being debugged, 

Although tests are the basic units of inference, there are 
important differences between tests and rules. Tests do not 
chain. Each test is a unit and is applied singly without 
direct interaction with other tests, Unlike a rule, 2 once 
selected and applied, a test is no longer eligible for selection 
unless disapplied 

Under appropriate circumstances, an applied test can be 
disapplied. All inferences resulting from the application 
of the test are retracted and the test is made eligible for 
reapplication. 

The application of a test has three stages. Each of these 
stages has corresponding fields, which allow the user to 
specify what should occur there, 
: Preliminaries. Perform the manipulations to the system 

necessary before the application of the query and record 

assumptions made for the test, 

■ Query. Obtain the desired information. 

■ Inferences. Update beliefs about the system based on the 
information obtained. 



Test: 




Lo c a 1- L oopback -Thru-Modem 




Bindings: 




?config 


terminal 

r * ?confkj is 


?term 
prebound " 






?config 


datalink 


?tink 






?ttnk 


t-modem 


?modem 








* concerned with only one modem 


Requirements 


?term 


in -working-state 


yes 






?config 


te rm- 1 i n k-con nected 


yes 






? modem 


in-working-state 


yea 


Precondit 


ons 


? modem 


tn-local-f oopback 


yes 



Query: ?term do- typed -chars- appear ?typed-chars-appear 

Inferences: if ?typed-chars-appear is no 

then FAULT: 7term lermmal-fauJl indicated 

FAULT: ?contig term-nol-connected indicated 
if ?typed-chars-appear is yes 
then FAULT: ?term terminal-fault eliminated 

FAULT: ?config term- not -connected eliminated 
FACT: ?conftg term -link-con nected yes 
FACT: 7 term in-working- state yes 



End-Test: 



Fig. 1. An example of a Schooner test. This test only applies 
to modem links, tt consists of putting the terminal in a normal 
state to transmit, putting the mortem in focal loopback mode. 
and then typing on the terminal to see if characters come up 
on the screen. This tost is us&d to verify the link between the 
terminal and the local modem 
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Preliminaries 

Two fields in a test, the precondition and requirement 
clauses, are provided to allow the user to manipulate the 
system into the state necessary to apply the query. They 
are both ordered lists of simple clauses, hut they differ 
substantially in intent and use. 

In the test shown in Fig. 1 , the precondition clause; 

?modem irviocaWoopfeack yes 

specifies a stale of the modem that must be satisfied before 
application of the query. After the precondition clause is 
satisfied (Fig. 2j t the modem will be in local loopback, 
regardless of its state before the test was selected. 

Requirement clauses are used to establish an assumption 
about the state of the system, To satisfy a requirement 
clause, Schooner must bring the system to a state where it 
can assume that the clause is satisfied (Fig. 3), Having done 
so. Schooner must then tag the assumption so that the test 
information and application can be retracted if the assump- 
tion later turns out not to be valid, 

In the test in Fig, 1, the requirement clause: 

?term in-working -state yes 

specifies that Schooner must put the terminal in a state in 
which it can be assumed that it works. If some previous 
test has put the terminal in local mode, then Schooner 
must ask the user to put the terminal in remote mode. Once 
this has been done, Schooner can assume that the terminal 
is in a working state , tag the dependency* and proceed to 
the next clause in the test. 

Queries and Inferences 

The query is used to obtain information once the system 



Is (here 

a present value 

in knowledge 
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NO 




assumed? 




Adit user far value 



r Does I he value ^ 
match the desired 
value'* j " 



Yes 



T 



No 



Ask the user to establish 
the desired value 



Is present value in 
knowledge case 
= desired value? 



Yes 




Get clauses that account 

tor difference between 

desired state and present 

state. 



Satisfy these clauses 
using Fig. 2 



< Now, 
Present = Desired > 




Fig. 2. Algorithm for satisfying a precondition clause 



Record assumption 
dependencies 



Fig, 3* Algorithm for satisfying a requirement clause 

is in a state appropriate to make the observation. A query 
is never used to achieve a state, but only to obtain informa- 
tion. Hence, the item In the third position in a query clause 
is always a variable I in Schooner, variables start with a 
question mark) rather than a desired value. The user is 
asked for the information needed by the query and the 
variable in the third position is set to that value* During 
the inferences, that value is substituted for the variable 
whenever ii appears. 

The inferences use the binding of variables set in the 
query to update Schooner's beliefs. Inferences may either 
assert beliefs about potential faults in the system or assert 
beliefs about the state of the system configuration. 

Inference Engine 

The Schooner inference engine is designed specif ical I v 
for troubleshooting. It uses a general troubleshooting pro- 
cess (Fig. 4) to investigate potential faults in the system 
being debugged. Schooner applies tests to the system to 
determine more about faults that may exist, The result of 
each test increases or decreases belief in various possible 
faults in the system being debugged. Subsequent tests are 
selected on the basis of such beliefs and other information 
provided by previous tests. 

Data Structures 

The Schooner Inference engine uses several partially in- 
ter! inked data structures. These provide an internal rep- 
resentation of the data link being debugged and maintain 
a consistent set of beliefs about the faults therein. Fhe most 
important data structures are Ihe configuration, the faults, 
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the tests, and the linkage. 

Configuration, The configuration (see Fig. 5) to a hierarchi- 
cal structure that represents the system being debugged, 
See the box on page 46 for a discussion of hierarchies. Each 
node in the configuration, called an entity, is a frame 3 rep- 
resenting a component of the system. 

Slots in the entity frame describe aspects of the slate of 
that component, During the specification stage each slot 
contains the following information: data type, possible val- 
ues , actual values, default value, and cost of determining 
value. During the troubleshooting session, each frame also 
holds the current beliefs regarding that aspect of the entity 
as well as other pertinent information such as how, when, 
and where those beliefs were obtained. 

In addition to the normal slots, each entity has an operat- 
ing summary slot named i n-work in g -state. This slot reflects 
the state of all the slots in the entity that are operating. For 
example* a value of yes in the operating summary slot of 
the data link in Fig, 5 indicates that Schooner was able to 
assume that the data link, the phone link, and both modems 
are in an operating state. 

The operating summary provides a powerful capability 
for describing test dependencies, A test whose require- 
ments assume that the entire configuration was in a work- 
ing state doesn't have to specify each individual entity to 
guarantee useful backtracking- This characteristic allows 
reference to the operability of the configuration at any func- 
tional leveh In the example of Fig, 5. operating summary 



Specify configuration. 



Configuration 
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Establish- system behavior 



Otherwise 
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No 



Apply i he test. 
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\B a fault concluded? 
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Resolve fault. 



Review beliefs (B) 







E53 

Fig. 5. Configuration data structure 

slots could be used to refer to the operability of the config- 
uration, the data link, either modem* or in general, any 
entity in the configuration. 

Faults. Faults are used for characterizing and manipulating 
failures that can occur in a system. They are ajso used for 
heuristic test selection [see below) and provide an explicit 
structure for holding curreril bel iefs of what might be wrong 
in the system. During the specification stage, a fault simply 
contains a description and a list of tests that point to it 
(Fig. 6). 

Faults are integrated into the configuration by their at- 
tachment to entities in the configuration. Faults are at- 
tached to the entity in which they occur [Fig. 7). For exam- 
ple, a fault representing a malfunction in the terminal 
would be attached to the terminal entity, but a fault repre- 
senting a baud rate discrepancy between the terminal and 
the port would be attached to the configuration entity. 
Tests. Tests are pointed to by faults. The outcome of a test 
can either confirm or deny a fault. 

Linkage. The three data structures above describe the es- 
sentials necessary for troubleshooting a physical system. 
Schooner contains an internal description of the system 
being diagnosed, a set of beliefs of potential faults, and a 
way of describing tests to be performed on the system, The 
linkage data structure (Fig. 7 J ties these three elements 
together In provide useful access when and where neces- 
sary, It provides links from an entity m the configuration 
to faults that may exist in the entity, and it provides links 
from a fault to tests that might tell about it. Once a test is 
applied, it finds the faults to which to assign status. 

In summary, the main duties of the linkage data structure 
are to ensure knowledge consistency, guarantee unique- 
ness* and tie the whole thing together, 



Fault: 


Key boa rd -Car d - D t scon ne c ted 


Frequency: 


Rare 


Description: 


Is the keyboard card disconnected? 


Resolved: 


P 1 ug -i nth e- ke y boa rd-ca rd 


Concluded: 


1 nspect - key bo a rd-c a rd 


Eliminated: 


Do - 1 h e-c u rso r-co nt ro 1- keys-move 




the-cursor? Hit-carriage-return 


Indicated; 


Does-controlg-ring-bell? 


End -Fault: 





Fig. 4. Schooner troubleshooting flow diagram 



Fig. 5. An example of a fault in Schooner This fault occurs 
when the card forme keyboard inside the terminal becomes 
loose in its slot 
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Heuristic Test Selection 

With an understanding of the essentials of the Schooner 
data structures, it is now possible to describe more clearly 
some of the other processes in the Schooner troubleshoot- 
ing flow diagram. Fig, 4. 

The test selection process {the box marked A in Fig. 4) 
is critical for rational behavior and proficient troubleshoot- 
ing. Poor selection of tests will irritate the user; make trou- 
bleshooting arduous, and create an impression of irratio- 
nality. This aspect of the system can single-handedly deter- 
mine Ihe difference between acceptability and unaccept- 
ability of an expert system. 

Each state change or belief change caused by a system 
manipulation or test completion causes Schooner to do a 
complete revaluation of all possible tests to ensure an 
action appropriate to the new situation (Fig. 8). Each poten- 
tial fault in the system is given a numeric pursuit desirabil- 
ity , which is a combination of its frequency fa static value 
assigned at specification] and its likelihood (a dynamic 
value received from tests during the session). Each test that 
can further indicate or disindicate the fault is given a 
numeric cost (see below), Schooner then selects the test- 
fault combination that best combines the attributes of low 
user effort (in the test) with high pursuit desirability (in 
the fault). 

The total cost of a test is the effort to do the query plus 
the effort required to put the system in a state to do the 
test. The resulting value is a function of the effort required 
to perform the test from the present state of the system, 
This technique for determining cost tends to minimize 
manipulations to the system by making Schooner take 
maximum advantage of the present state. For example, if 
the user is asked to run a diagnostic to look at the state of 
a port, the effort required to do a system dump will cause 
Schooner to obtain all the Information possible from doing 
the dump before going to another area of investigation. 

Review Beliefs 

When a fault is discovered and subsequently resolved 
in the system being diagnosed. Schooner reviews the infor- 
mation and consequent beliefs that have accumulated dur- 
ing the session. Typically, some of the information gained 
from previous tests and observations is no longer valid. 
Schooner must find the invalid beliefs and retract them. 
The nature of the Schooner data structures makes this pro- 
cess fairly simple. Schooner merely has to determine (ac- 
cording to procedures described below) which tests are no 
longer valid and "disapply" them. This procedure is what 
occurs in the review beliefs process (the box marked B in 
Fig. 41. 



Entity- 



-^ Fault 



■ Test 



Fig. 7. Ltnkage data structure Faults are attached to an en- 
tity. A fault's likelihood results in 3 test's selection The test 
manipulates the entity and infers about the fault. 



For each Fault in possible faults 
Evaluate Fault pursuit desirability 

For each Test that can tell about Fault 



Evaluate Test Cost 
Test -Desirability = 
Fau I (-Desirability Test-Cost 



End For 
End For 
Select Test with Maximum Desirability 

Fig. 8. Schooner heuristic selection algorithm. 

When a test is disappliecL all its results are retracted, 
that is, all status assigned to faults as a result of applying 
the test is retracted. Additionally* the test is made available 
for reap plication. 

There are two reasons for Schooner to d is apply a test. 
The first occurs when the test indicated a fault that was 
subsequently resolved. Since the test results were based 
on the existence of that fault, such actions are necessary 
to maintain consistency. The reapplication of the test 
would be very likely to yield different results, since the 
fault has been remedied. 

The second reason for disapplying a test occurs when 
the test depends on an assumption specified in a require- 
ments clause that the discovery of the fault has con- 
tradicted. In the example test shown in Fig. I, the clause 

?term in -working-state yes 

required an assumption , since it wouldn't be known thai 
the terminal was in a working state. The subsequent discov- 
ery of a resolvable fault in the terminal (for example, being 
in local mode) would cause Schooner to instruct the user 
to resolve the fault. Schooner would then go into the review 
beliefs process, which wouldnotice that the test do-local-loop- 
back-at-modem made an invalid assumption. All beliefs that 
(he test asserted would be retracted and it would be made 
eligible for reapplication. 

The resulting behavior is a natural sorl of backtracking 
and attention control, As Schooner applies tests that indi- 
cate a failure in an area of the link, faults in this area are 
concentrated on. If a fault is discovered and resolved, then 
Schooner backtracks, verifying symptoms that were discov- 
ered earlier to determine the present characteristics of the 
link being diagnosed and to decide where to Investigate 
next. 

These dependencies are a natural representation of the 
use of tests as a unit of knowledge. There are no special 
requirements for the specification of dependencies. All the 
right information gets backed out when it should be 

Behavioral Results 

Schooner has turned oul h> he a competent expert system 
in tlu j domain in which it has been applied. It is effective 
at discovering faults for which it has knowledge. Knowl- 
edge acquisition and formalization, although mmbersome, 
allows Schooner more or less unlimited growth of expertise 
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Hierarchies 



Hierarchies are a potent tool in artificial intelligence applica- 
lions. A hierarchy is an n-ary tree with a root node. Each node 
ss represented as a frame The frames are connected by pointers 
that poini from parent node to child, in a hierarchy, each parent 
(ascendant) node is a generalization of lis descendant nodes. 
To explain this more clearly, this article will discuss two types of 
hierarchy. 

Inheritance Hierarchies 

Inheritance hterarchies are used to describe objects in the real 
world, In an inheritance hierarchy {see Fig. 1), AKO (a kind of) 
tree, higher nodes on the tree represent more general categories 
while lower nodes are more specific. At each level in the hierar- 
chy, nodes contain properties that distinguish them from their 
siblings. Information common to all siblings is inherited from par- 
ent nodes. If the user queries a property of a node in the hierarchy, 
access routines {Fig 2) search for the requested information, 
looking first in the specified node and then in ascendant nodes, 

For example, in Fig 1 , the mammal node would hold the prop- 
erty of being warm-blooded. Any queries about this property to 
the ape, dog, or bat nodes (or nodes descendant from them) 
would result in the correct value being returned. 

This method for representing objects in the real world is used 
widely in artificial intelligence It has proved to be a very powerful 
representation technique for several reasons, It provides enor- 
mous economy of data; each internal representation of every 
kind of bird does not contain information that it breathes, eats, 
mates, flies, etc. This information logically, conceptually, and 
sensibly belongs at higher levels in the hierarchy. Adding new 
objects is very easy. The addition of a new type of hawk called 
the kitty hawk would automatically result in the right sort of as- 
sumptions made about it — it flies, is carnivorous, etc. Only infor- 
mation that differentiates the kitty hawk from other types of hawks 
would actually have to be added to the kitty hawk node. 

Functional Hierarchies 

For troubleshooting, expert systems need sophisticated tech- 
niques for representing the systems they are debugging 
Schooner and IPT (see article, page 48) represent devices they 
are debugging by creating an internal representation that mirrors 
the device's functional properties. Each device (see example in 
Fig. 2 on page 49) rs shown as the union of its major subsystems. 
Each subsystem, in turn, is divided into smaller subsystems until 
it reaches the smallest size of component the troubleshooting 
system can reason about. This type of representation of a device 
is called a functional hierarchy. 



Get Prop From Mode 
(Parameter: Node) 



Does properly exist 
on Node? 



No 



Ascendant-Node: = 
Get- Ascendant-Node (Node) 



Return Property 



Yes 





Is Ascendant- Node 
null? 



Yes 



No 
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( Ascends n l - Node) 

(Recursive Gall) 



Fig. 2. When a user queries a property of a node, access 
routines search for the requested information. 

Functional hierarchies offer several advantages for device rep- 
resentation. First, they allow a dtvide-and-oonquer troubleshoot- 
ing methodology Instead of verifying the operability of each 
subcomponent at the lowest level, the expert (or expert system) 
can determine the operability of one of the subsystems. Only 
when a fault has been isolated to a subsystem does the trou- 
bleshooter then investigate the nodes that lie below the subsys- 
tem. 

Another advantage of functional hierarchies that is used by 
Schooner is that the operability of a node at any level can be 
viewed as the sum of the operability of all nodes descendant 
from it, Schooner automatically maintains this relationship, updat- 
ing its beliefs of the operability of nodes at intermediate and 
upper levels based on the discovery of faults at lower levels. As 
a result, knowledge for Schooner can refer to the operability of 
a device with assurance of appropriate results. For example, if 
the electrical system m Fig. 2 on page 49 was determined to be 
in a working state, Schooner would merely have to assert that 
fact rather than eliminate all the faults in or below that node in 
the diagram This presents a concise way of giving information 
and preserves accuracy even when faults are subsequently 
added or removed. 



Fig. 1, Animal characterization 
hierarchy. 
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with few expected changes to the inference engine or 
knowledge representation technique. 

The following transcript from a sample session demon- 
strates the knowledge base and provides a picture of 
Schooner's behavior with respect to the concepts described 
above. 

Sample Session 

This example demonstrates simple backtracking. 
Problems: remote mode key up and terminal discon- 
nected from cable. 

WELCOME TO SCHOONER! 

« Debugger for asynchronous, point to point, direcl connect, 
terminal problems, >> 

Describe the type of probtem you are experiencing, 

-» inabifrty-to- tog-on 

Device name for reference: PORT 3. CABLE-3 and TERMINAL-3 

-» acknowledge 

Using the cursor control key, move the cursor out from the feft margin. 

Depress the return key, what appears on the screen? 

—* cr-only 

is TERMINAL-3's remote mode key in the down position? 

-* no 

Depress TERMINAL- 3's remote mode key. 

— ' acknowledge 

Using the cursor control key, move the cursor out from the left margin 

Depress the return key, what appears on the screen? 

-* nothing 

Schooner believed a problem bad been solved, It then 
found out what the symptoms were and discovered a prob- 
lem still existed and the symptoms are now different. 

is TERMINAL-3's block-mode key in the down position? 
-* no 

Hold down the Ctrl key and press q 
-* acknowledge 

Is any text coming up on the screen? 
-» no 

Turn off TERMINAL-3, wait 5 seconds, then turn it back on again 
-* acknowledge 

Using the cursor control hey. move the cursor out from the left margin. 
Depress the return key, what appears on the screen^ 
-* nothing 

What is displayed on the first tine of TERMINAL-3? 
— > terminal- ready 

\s TERMINAL-3 connected securely to CABLE-3? 
■ no 

Connect TERMINAL-3 to CABLE-3. 

-* acknowledge 

Using the cursor control key, move the cursor out from Ihe left margin. 

Depress Ihe return key, what appears on the screen? 

-* if -cr-and -colon 



ProWem(s): 

TERMINAL-DISCONNECTED 

TERMINAL-IN-LOCAL-MODE 
resolved. 
THANK-YOU 
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A Rule-Based System to Diagnose 
Malfunctioning Computer Peripherals 

The Intelligent Peripheral Troubleshooter, an expert system, 
currently diagnoses malfunctions in HP disc drives, but 
other devices will be easy to add to its repertoire. 

by George FL Gottschalk and Roy M. Vandoorn 



AS BUSINESSES PLACE more and more reliance on 
their computer systems, downtime has an ever 
greater impact on their profits* In addition to the 
efforts of HP product divisions to make HP products more 
reliable, HP's Knowledge Systems Laboratory has taken a 
dual approach to minimizing customer downtime, aiming 
first to predict failures before they happen, 1 and second, 
to improve the accuracy of the diagnosis. 

To diagnose malfunctioning peripherals more effec- 
tively, the Intelligent Peripheral Troubleshooter [IPT] has 
been developed, IPT is an expert system that performs a 
diagnosis of a malfunctioning peripheral based on aspects 
of its current state. 

EPT is geared to work in three different environments. It 
can be used as a Response Center tool, as an aid to the HP 
service person on-site as an escalation tool, and as a training 
aid. When a customer places a service call, an engineer 
located in one of the Response Centers contacts the cus- 
tomer to collect as much information as possible about the 
problem and determines if an on-site visit is actually re- 
quired. During this session, IPT is used to determine the 
most likely cause of the problem. This helps ensure that 
the service person going on-site will have the correct parts, 
thus reducing the number of multiple-trip Fixes. 

Once an HP service engineer is on-site, IPT may request 
more precise information that can only be obtained by a 
trained engineer. Examples are using a disc service unit T 
measuring voltages across test points, and measuring seek 
times, Like human experts, IPT will be able to make a more 
precise diagnosis if it is given this additional information. 
IPT can also be used by the Customer Engineer Assist Group 
in the Response Center when their expertise is required 
because of problem escalation. 

For any expert system that attacks a class of problems 
to be successful, enough generality must be built in so that 
an entirely new system is not necessary for every variation. 2 
For IPT to be successful, the underlying approach must be 
general enough to be able to deal with all HP peripherals* 
For this reason, IPT has the following three characteristics: 

A peripheral independent inference engine 

A system and component representation of a peripheral 

A knowledge base built by an interactive rule-maker. 

IPT follows the classic expert system model of separation 
of knowledge and the inference engine as shown in Fig* 1* 
This allows the inference engine to be totally device inde- 
pendent. 



Given the relevant knowledge, IPT could troubleshoot a 
wide range of devices, even household appliances such as 
toasters. In this paper, all examples will deal with a fic- 
tional toaster. This substitution enables this paper to ignore 
the technical details of disc drives. 

What Fig, 1 does not show is that there are actually three 
types of knowledge, which are directly correlated with the 
three phases of the inference engine. The three phases can 
be thought of as: narrow the problem down to a logical 
group of failures, explore each of these failures to see which 
is most likely, and perform the repair, 

IPT Knowledge Representation 

In previous attempts to develop systems that could trou- 
bleshoot malfunctions, commonly referred to as symptom- 
cure systems, the diagnoses were reached by a best-fit cor- 
relation between the symptoms and the possible cures. For 
EPT to be a useful tooL it needed to be able to deduce the 
problem much like a human expert. The difficulty, as in 
all expert system development, was to determine what in- 
formation is actually useful and how to represent it. 

It turns out that describing troubleshooting in terms of 
symptoms and cures is too limiting* To describe the process 
better, components and ohservables are more appropriate 
terms. Components are field replaceable or adjustable 
units. Observables are determinable states of the device, 
such as the state of the bread after toasting, and the setting 
of the knob. 

In troubleshooting scripts, isolating the failing compo- 
nent is not a one-step process. Therefore, the components 
are grouped into their logical systems. For example, in the 
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Fig. 1, Separation of IPTs knowledge base and inference 
engine 
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toaster diagrammed in Fig. 2 there are three systems: the 
electrical system, the mechanical system, and the timer 
system. One of the interesting features of systems is that 
one system can be completely contained within another 
system. For example, the components of the timer system 
(the clock and the timer knob) are both completely con- 
tained within the mechanical system. [For simplicity we 
will assume that the timer system will work even if the 
toaster is not plugged in,) Thus, depending on our goals, 
we could solve for the clock as an element of either the 
mechanical system or the timer system, Systems are the 
logical groups that IPX solves for in phase 1 of a trou- 
bleshooting operation. 

There are distinct advantages in using systems to group 
components, First of all, the rules are easier to enter because 
we can think of observables as indicating a typical electrical 
problem rather than specifying the effects on every compo- 
nent within the electrical system. Next, using systems re- 
duces our search space when we try to solve for a compo- 
nent, if IPT believes a problem is in the electrical system, 
for example, it won't waste its time asking questions about 
the timer. IPT will only go after those problems it considers 
interesting enough to solve, 

A further advantage is that this representation brings us 
cine step closer to the real world. Artificial intelligence 
applications often get their results by performing clever 
numerical or logical manipulations that have no underlying 
theory in reality, We can say, however, that IPT under- 
stands that the clock is part of the timer system and that 
it is related to the mechanics of the toaster, Although IPT. 
of course, has no real grasp of the concept of a toaster, and 
the words clock and timer have no meaning to IPT t it is 
hoped that someday IPT will be able to diagnose hardware 
by looking at a device on a functional level. Getting TPT to 
recognize that certain components are related in a special 
wny is a good start. 

Most observables point to multiple components and each 
component is implied by multiple observables. When an 
observable does point to multiple components, it is possi- 
ble to order the components in order of likelihood. A rank- 
ing strategy was developed to reflect this relationship: 



Always 



Given that this observable occurs, then Ihis 
component will always beat fault. 
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Fig. 2. IPT's representation of a toaster, 



Consistently Given that this observable occurs, then this 
component will consistently be at fault. 

Frequently Given that this ohservahle occurs, then this 
component will frequently be at fault. 

Occasionally Given that this observable occurs, then this 
component will occasionally be at fault. 

Rarely Given that th is observable occurs, then this 

component will rarely beat fault. 

Words are used to describe the categories to the experts 
since it is much more natural than dealing with prob- 
abilities. Computers, however, are designed to think in 
terms of exact numbers, not vague terms. Therefore, it was 
necessary to convert the logical categories to probabilities, 
IPT actually goes one step further and uses certainty ratios, 3 

The advantages of certainty ratios are: 
b Comparisons are manipulated more easily 

All observables are weighted equally 

■ Certainty ratios are easier to maintain 

■ Results are mathematically Lurrect, 

The certainty ratios are hidden from the user, since intui- 
tive probabilities are easier for humans to understand. A 
70% probability means something, but a corresponding cer- 
tainty ratio of 2.33 has no intuitive meaning. IPT uses prob- 
ability to rank components and systems. It does not con- 
sider probabilities to be an absolute measure of certainty. 
Thus the diagnosis may not necessarily be halted when a 
component reaches a probability level of 99%, Instead t 
probability is used to choose the most likely path to inves- 
tigate and to halt diagnosis when one component is signif- 
icantly more indicated than all I he other components, This 
method has the major advantage that as long as the relative 
ranking of the components is correct, the exact category in 
which the component appears is not very important. This 
approach also has the consequence that when we can use 
multiple experts, the original work of the first expert does 
not need to be totally redone when the next expert adds 
knowledge. 

Nearly every time something new is observed, that obser- 
vation affects the likelihood that some system or compo- 
nent is at fault. Thus every time an observable is deter- 
mined, the certainty ratios should be updated. Therefore, 
at any time during the diagnosis, it is possible to step away 
and see how strongly IPT believes a particular component 
is at fault. 

The effort involved in determining the state of an observ- 
able can range from trivial to major. It is important that 
during the troubleshooting script the easy questions are 
asked first and the more difficult questions are asked only 
as a last resort. The categories IPT uses to sort observables 
are: trival, easy, hard, and very hard. The guidelines used 
for the categories are: 

Trival Can be done by looking at the peripheral 

Easy Can be done by looking at the logs 

Hard Can be done by running a diagnostic 

Very Hard Requires swapping out a board 

These difficulty rankings are called the cost of the observ- 
able. 
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Within the three types of knowledge, both forward chain- 
ing and backward chaining rules are used. The first type 
of knowledge, which is used to determine the system at 
fault, is called initial observables. These rules can be 
thought of as the reason the user perceives there is a prob- 
lem, They are implemented via forward chaining rules be- 
cause forward chaining rules provide the control for the 
follow-up questions. Examples of initial observables for a 
toaster are; 

Toast pops up but cold 

Toast won't pop up and no heat 

Toast imitating charcoal 

If the user selects the last choice, toast imitating charcoal, 
then we want to investigate the mechanical system. By 
examining the toaster diagram. Fig. 2. it can be seen that 
the determination is between the timer system and the 
spring, Since the objective of this phase is to determine 
the system in which the fault is located, for this example 
this phase is completed. If the user selected toast pops up 
but cold, follow-up questioning is necessary to determine 
the faulty system. 

The second phase attempts to determine the faulty com- 
ponent. Associated with all components are backward 
chaining rules. Each of these rules within the system deter- 
mined in phase 1 needs to be investigated. The difficulty 
ranking for the observables now determines which rules 
are solved for first. In our example, it was agreed that the 
cost of checking the setting of the knob is trival; therefore, 
that is the first line of reasoning. 

Associated with each backward chaining rule is a for- 
ward chaining rule that causes the certainty ratios to be 
updated based on the user's response, Forward chaining 
rules are used here since an unexpected response from the 
user may satisfy another rule. Since forward chaining rules 
automatically fire, the certainty ratios are properly up- 
dated. In other words, these rules have been implemented 
so that the backward chaining rule currently being attempted 
need not correspond to the forward chaining rule that ul- 
timately fires. 

Once the diagnosis has been completed, IPT instructs 
the user how to proceed with the repair. These are called 
hint rules. Hint rules are forward chaining rules and their 
premises are either an indicated component or a set of 
observables. 

The next section examines how the Inference engine uses 
these three types of rules. 

Device Independent Inference Engine 

IPT is based on the troubleshooting strategy of actual 
experts. As discussed earlier, troubleshooting is not a one- 
step process, Our experts find it easier to solve problems 
by thinking of components in terms of a system, For in- 
stance, if the heating element of a toaster does not turn 
red, then all components that have to do with electrical 
connections are suspect. Solving the problem now becomes 
easier if we concern ourselves with only those components. 

Although much of the information is device specific, an 
underlying core of knowledge about the troubleshooting 
of devices exists. This core was coded info the IPT inference 



engine. 

Several terms have been used to describe the code portion 
of IPT. It has been called the driver, the engine, and the 
inference engine, What all of these terms have in common 
is that they imply that the engine is the force that underlies 
IPT T and in this respect they are accurate. IPT's knowledge 
of a device is merely data, It is the driver that manipulates 
that data to derive a diagnosis. As we saw earlier, looking 
directly at the knowledge is similiar to looking at a dump 
of a data base. It does not have meaning until the engine 
interprets the data, Therefore, the only connection the user 
has with IPT is through the inference engine. 

Of course, if the inference engine accessed the knowledge 
by name, it wouldn't be device independent, Instead, we 
have defined a protocol that allows us to access the knowl- 
edge indirectly. The foundation for this protocol was pro- 
vided by HP-RL [Hewlett-Packard Representation Lan- 
guage 4,5 ] and w T ith our extensions constitutes the knowl- 
edge bus. At present, however, the knowledge bus Is very 
primitive and is dedicated to one device at a time. All of 
this complicated structure does have a purpose: it allows 
IPT to be device independent except for the knowledge. 

There are many conflicting demands on IPT's engine. 
The major concern was to limit the search space as quickly 
as possible by eliminating unnecessary questions and ask- 
ing the remaining pertinent questions in a logical order. 
To implement this strategy, IPT's engine is split into three 
distinct phases, as shown in Fig, 3, 

The first phase is the system determination phase. The 
purpose of this phase is to ask general, easy questions until 
either a component or a system is indicated. If a component 
is indicated, then the third phase is entered and the second 
phase is skipped. 

If a component is not indicated, then IPT enters the sec- 
ond phase. Since a system is by definition indicated at the 
start of the second phase, the purpose of this phase is to 
determine the component at fault. Because the component 
determination phase is much more confident of the area 
to search, it can begin asking harder questions of the user, 

Finally, the repair procedure phase is entered. This phase 
generates output from IPT t such as repair hints, the relative 
standings of the various components and systems, and a 
recap of the diagnosis, 

One of the important points to keep in mind is the differ- 
ence in the paths between IPT and traditional troubleshoot- 
ing flowcharts (see Fig, 4]. A traditional troubleshooting 
flowchart emphasizes tests that tend to eliminate areas of 
search space very slowly, and that perform exhaustive 
searches on the components in question. IPT\ on the other 
hand, maintains its flowcharts in a dynamic environment, 
and can choose its next question based on the latest infor- 
mation. It can also choose to perform the easiest possible 
test that will solve for a particular component. Maintaining 
this type of flexibility is an important factor in keeping IPT 
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Fig. 3. IPTs troubleshooting strategy. 
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efficient. 

Phase 1: System Determination 

The first phase that EPT enters is the system determina- 
tion phase. This phase tries to indicate either a system or 
a component by collecting broad and easily gathered infor- 
mation, What IPT Is trying to do is to get an idea of where 
the problem lies, To do this. IPT maintains a list of ques- 
tions to be asked. Forward chain rules are attached to the 
answer of each question so that the probability levels of 
the various components and systems are updated after each 
question, 

Let's assume that a customer calls the Response Center 
with a problem, hi this case, IPT should try to emulate 
what a customer engineer does when the CE first enters 
the customer's kitchen. A good example would be to ask 

Which of these best describes the problem with your toaster? 
The toast won't pop up and no heat 
The toast pops up but is cold 
Toast imitating charcoal. 

Now let's suppose that the customer indicates that the 
problem involves toast that pops up but is cold. When this 
observation is entered into IPT, the forward chain rule 
associated with cold toast is fired. Since the toast does in 
fact pop up, the probability that the problem is with the 
spring is downgraded. Furthermore, the electrical and 
timer systems have added .support. At this time, however, 
neither system is indicated enough to leave this phase and 
search for components. 

Instead , the time has come to search for the next question, 
The question of whether a fork will pop up the toast is 
discarded because we know that the toast has already 



popped up, and IPT only asks necessary questions. IPT 
then finds that the next thing to ask is 

Does the heating element gfow bright orange when 
toast is inserted in the toaster? 

Yes 

No 

Let's assume that the customer says the heating element 
does not glow. The fact that the heating element does not 
glow seems to indicate the problem is m the electrical 
system, and that is sufficient information to leave this phase 
arid enter the component indication phase, 

Although toasters are much simpler devices than disc 
drives, this question is typical of those asked in this phase. 
In the case of disc drives, of course, the user will also be 
asked to obtain the logs so a history of the drive's errors 
can be reconstructed. The general idea of this section is to 
gather only enough information to point to a specific area 
and then leave. Therefore, it is not appropriate to ask the 
customer to take an ohmmeter and measure the resistance 
across the heating element. Also, these questions will not 
usually be able to solve uniquely for a component but will 
only affect the probability levels. These types of behavior 
are left for the next phase. 

Phase 2: Component Determination 

The purpose of the component determination phase is 
to solve for a failing component given an indicated system. 
Of course, this is not always possible 1 so this phase does 
have the ability to exit given certain conditions. What really 
happens is that the component determination phase at- 
tempts to solve for a unique component, and if the chain 





(a) 




Fig. 4. (a) In a traditional troubleshooting tree, the questions are set up in advance and are 

selected solely by the answers to previous questions, (b) IPT's dynamic troubleshooting tree 

chooses the next question from the question bin based on both the current environment and 

the answers to previous questions, 
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of reasoning used to attempt the solve is completed, then 
a conclusion is asserted. 

This is why it was decided to use backward chaining 
rules for this part of the knowledge. Within the indicated 
system, each component is sorted according to its probabil- 
ity factor. Subject to a difficulty ranking of the observable, 
these components are attempted from the most likely down 
to the least likely via backward chaining rules, When a 
component is "proved" to be at fan 1 1 , phase 2 is completed. 

In the toaster example after phase 1 , the electrical system 
was indicated because the heating element did not glow. 
Clearly, IPT should now determine what is wrong with the 
electrical system. Let's assume that IPT has the following 
rules concerning the electrical system: 



heating element is more indicated, it would like to use one 
of the heating element rules, but the difficulty ranking im- 
plies that it should ask the cord rule first because it is 
easier to ask. 

Is the toaster plugged in? 
Yes 
Nq 

If the cord wasn't plugged in then IPT would be able to 
conclude that the cord should have been plugged in. 

To make things more interesting, however* assume that 
the cord was plugged in. Next, IPT examines the Heating- 
Element -Rule and sees that the first premise is satisfied. 



Cord-Rule: IF the cord is not plugged in 

THEN the problem is in the cord 
DIFFICULTY RANKING— Trival 

Heating- Element-Rule: IF the cord is plugged in 

AND IF there is infinite resistance across 

the heating element 
THEN the problem is in the heating element 
DIFFICULTY RANKING— Hard 

H eating- Element-Rule 1 : IF the cord is plugged in 

AND IF the customer declines to state 

what the resistance across the 

heating element is 
THEN the problem is in the heating 

element or the wires connecting the 

heating element to the cord 
DIFFICULTY RANKING— Hard 

Finally, let's assume that at this point the heating element 
is slightly more indicated than the power cord, 

First of all, IPT examines the three rules. Because the 



Using an ohmmeter please measure the resistance across 

the heating element. 
Is the resistance infinite? 

Yes 

No 

Dont know 

If the user answers that the resistance is infinite the rule 
is satisfied and this phase is completed. 

Suppose, on the other hand, that the user didn't have an 
oh mmeler handy. At this point theHeating-Element-Rule-1 
is satisfied and gets asserted, even though IPT wasn't con- 
sidering this rule at the time. This flexibility illustrates 
why there is no logical connection between the forward 
chaining and backward chaining rules. 

The component determination phase is crucial to IPTs 
success. Using backward chaining rules for agenda control 
allows us to eliminate large areas of the search space. Be- 
cause we have narrowed down the possibilities so far* we 
feel confident in asking hard questions, and can isolate 
components to a high degree of certainty. Finally, IPT main- 
tains a great deal of control over when rules are attempted 



* - — + 



I 

I IPT was unable to obtain an exact diagnosis 

I It did, however, find some likely candidates 

| %%X%XXX%%%XXXXXX%%%%%%XXXXX REPAIR PROCEDURES XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

I If the toaster has a serial number below 687, then verify that the 

I orignal 10 amp fuse has been replaced with a 20 amp fuse. 



I 

| ZXXZXXXXXXXXXXXXXXXXXXXX COMPONENT PROBABILITIES XXXXXXXXXXXXXXXXXXXXXXXXXXX 

I 

I 

I 

| The following components have the indicated probability 



I VERY LIKELY 

I 

| WIRE5-FR0M-ELEMEKT 



LIKELY 
HEATING-ELEMENT 



SOMEWHAT LIKELY 



j XXXXXXXXXXXtXXXXXXXXXXXXXX SYSTEM PROBABILITIES XXXXXXXXXXXXXXXXXXXXXXXXXXXX 

The following systems have the indicated probability 

VERY LIKELY LIKELY SOMEWHAT LIKELY 

ELECTRICAL-SYSTEM 



Fig. 5. An IPT diagnosis. 
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so that the cost of performing a test and the present likeli- 
hood of a component are both taken into account. 

Phase 3: Repair Procedure 

When the third phase is entered, LPT's diagnostic process 
is over. The purpose of the repair procedure phase is to 
take the information gathered during the diagnosis and 
print it out in a usable format, in most traditional trouble- 
shooting programs this phase is usually trivial; a message 
or a number may be output on some device, In IPT T s case, 
however, there may be certain instances where additional 
reasoning must be performed. 

The first thing that is printed out is a list of repair hints. 
Repair hints may contain the necessary steps for a repair, 
additional probability information, safety hints, or relevant 
information from a service note. 

Assume in the toaster example that all toasters with a 
serial number below 687 were built with a 10-ampere fuse 
instead of a 20-ampere fuse. Because so few of the toasters 
now have the lO-ampere fuse, it wouldn't make sense to 
include this as a component determination rule. Instead, 
we decide to make it a hint rule that only prints out when 
wp haven't isolated the problem but know it's in the elec- 
trical system. Fig, 5 shows what the 1PT output might took 
like. 

The important thing about hint rules is the intelligence 
they must display. If IPT were to print out all of the hint 
rules for a particular device all of the time, there would be 
so many that the user would refuse to read them, By making 
hint rules fire only under certain conditions, IPT is able 
to make this portion of the output both important and con- 
cise. 

If the component determination phase was successful! 
the next item to be printed out is the indicated diagnosis. 
Since the component information phase cannot always 
uniquely determine a component, after the indicated diag- 
nosis is a list of components grouped according to thoir 
probability. The components are divided into three groups: 
most likely, likely, and somewhat likely. These groupings 
indicate the order in which IPT would like the components 
to be considered. 

This time, however, the grouping also takes the difficulty 



of repair into account. For example, in the diagnosis shown 
in Fig. 5, the heating element is probably more likely than 
the wires connecting the element to the cord* but since 
checking the wires requires only a visual inspection, IPT 
would suggest checking the wires first. 

Finally, the sections that illuminate IPT's reasoning pro- 
cess are printed, The first of these sections is the probability 
ratings of the systems. Next, the justification of any compo- 
nent determination rules is output. Finally, a recap of the 
observations entered into the system is printed. The pur- 
pose of these three sections is to give the users of IPT a 
way of determining why IPT made its diagnosis. 

Conclusion 

The Intelligent Peripherial Troubleshooter has proved 
the feasibility of developing knowledge-based systems to 
troubleshoot peripherals. Currently its diagnostic scope — it 
can diagnose HP 79 2X and HP 793X Disc Drives — is limit- 
ed, but IPT will grow to cover more and more devices. The 
device independent inference engine makes it possible to 
add devices without the need of developing an entirely 
new strategy; 
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Multilevel Constraint Based Configuration 

The goal of My con, a prototype expert system for 
configuring computer systems, is to relieve the support 
engineer of the tedious task of configuration of a customer 
order. 

by Robert I. Marcus 



ANEW ALGORITHM for handling computer system 
configuration problems has been implemented in a 
prototype called Mycon developed at the Hewlett- 
Packard Knowledge Systems Laboratory. 

The original Mycon was written in Portable Standard 
Lisp using the HP AI Workstation environment 3 consisting 
of an objects package, a frame-based package, and an 
EMACS-type editor/ NMOHK. This enviroment made it 
possible to construct an interface in which all the knowl- 
edge about the devices, constraints, and configurations of 
a system is easily accessible to the user. This firs I prototype 
focused on placing devices on I/O channels. 

This prototype has been converted into IIP standard Pas- 
cal and is running on the HP 3000, configuring the HP 
3000 Series 88. The latest version also performs siot, junc- 
tion panel, and cable configuration. The ultimate goal is 
to produce a program that can free the support engineer 
from the tedious task of logical configuration of a customer 
order. 

The fundamental idea of the Mycon algorithm is very 
simple, but the algorithm appears to have a wide range of 
applications- The basic principle is that at every stage of 
the configuration process there are explicit constraints that 
cannot be violated. 

The basic problem solved by Mycon is to construct a 
system configuration that satisfies a series of constraints 
supplied by an expert. The coostraints fall into two differ- 
ent classes: required and suggested. A configuration is feas- 
ible only if it satisfies all of the required constraints. The 
suggested constraints are guides for selecting the best 
choice among ihe feasible configurations. 

The prototype performs the logical configuration of a 
computer system. This problem consists of taking a set of 
devices and attaching them to I/O channels to maximize 
performance at a reasonable cost. Typical required con- 
straints are The maximum number of devices that can tie attached 
to a channel is 6 and High-speed devices must be attached to high- 
speed channels. Typical suggested constraints are Spread the 
high-speed devices evenly and Keep tow-speed devices on tow-speed 
channels. The difficult part of the problem is to use the 
knowledge embedded in the suggested constraints to con- 
trol and evaluate the system configuration. The method of 
multilevel constraints was developed to handle this part 
of the problem. 

Previous work on configuration problems, such as the 
Rl system developed by Digital Equipment Corporation 3 
has used forward chaining rules fed guide the configuration. 
The advantage of a rule-based or constraint-based system 



is the ease of maintenance and updating. In this paper, 
multilevel constraints axe discussed as an alternative or 
supplement to forward chaining. The method can be ex- 
tended to any problem in which there are qualitative pref- 
erences among a range of feasible solutions, 

Basic Method as Used by Mycon 

The steps in the basic method are: 

1 . Choose an initial algorithm that can generate candidate 
partial configurations as steps towards a final configura- 
tion, 

2. Use the required cuoslraints as a pruning mechanism 
to ensure that the initial algorithm will produce only 
feasible configurations. 

3, Add suggested constraints as necessary to maximize the 
quality of the Final configuration and to guide the con- 
figuration away from unsuccessful partial configuration 
that will require backtracking. 

4, If the initial algorithm generates a sufficient number of 
possibilities , and all the required constraints are staled, 
and the suggested constraints are chosen well, then the 
method provides an efficient algorithm for optimal con- 
figuration. 

The idea of multilevel constraints is to assign each con- 
straint to a level determining when it is active and must 
be satisfied. In Mycon, the required constraints are assigned 
a level of 100, Suggested constraints are assigned lower 
levels ranging from to 90 in steps of 10- The program 
begins attaching devices to channels keeping all const ra in ts 
active. If a feasible configuration cannot be completed, then 
constraints at level are dropped, the level is incremented 
by 10, and an attempt is made to attach the remaining 
devices, This process of dropping constraints is continued 
until the configuration is successfully completed or the 
level reaches 100. 

This method offers several advantages over a strictly for- 
ward chaining approach. The number of constraints is 
smaller than would be necessary with forward chaining. 
It is not necessary to introduce additional predicates to 
resolve conflicts and to prevent the premature firing of 
rules. It is possible to evaluate the results of the configura- 
tion by keeping track of the suggested constraints it vio- 
lated. Finally, the user of the system can suggest an alter- 
nate configuration for the program to evaluate and compare 
with the original result, This last feature is very important, 
because it allows a user to determine why the program has 
selected a particular configuration and which alternatives 
are equally suitable. 
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The Mycon Prototype 

The user interface to Mycon is an interactive version of 
the form used by salespersons when taking a customer 
order. The form will only accept orders that satisfy all the 
basic system requirements regarding maximum and mini- 
mum numbers of devices and peripherals. Mycon can also 
determine if an order contains too many devices to be con- 
figured legally and will advise the user on how to reduce 
the order. 

When a correct order has been entered, the program 
chooses the number of high-speed LO channels and the 
system disc following guidelines suggested by an expert 
in configuration. The user is informed of the reasons gov- 
erning these choices and is allowed to overrule them. 

Next, Mycon attempts to place devices on the 10 chan- 
nels using the active constraints to determine if the place- 
ment can be done. The devices axe arranged in an order 
based on heuristics used by experts, in general, devices 
that are subject to more constraints, such as the system 
disc, are placed first. The user is able to watch the devices 
being placed one at a time with levels reported if desired. 

A device is placed on the first I/O channel for wliich no 
active constraint is violated. If a device can't he placed at 
a level, then no further attempt is made at that level to 
place devices of the same type. When all devices possible 
have been attached at a given level, Mycon checks to see 
if the configuration is completed, If any devices remain to 
be placed, the level is raised and constraints below the 
current level are dropped. If the level reaches 100 without 
a successful configuration, the user is informed that it is 
necessary to reduce the number of high-speed I/O channels, 
which will lower the performance of the system. If the user 
decides to continue, the number of high-speed I/O channels 
Is reduced, all devices are removed from the channels, I he 
level is set at G\ and the configuration is restarted. 

This process is repealed until a successful configuration 
is produced or no further reduction is possible. The last 
possibility has been eliminated in practice by using a max- 
imal pa eking estimator on the configurability of the original 
order before starting configuration. 

Finally, the user can reconfigure the system by adding, 
moving, and deleting devices, Mycon will evaluate the 
user's configuration and compare it with the original resuh, 

Examples of Multilevel Constraints 

Multilevel constraints serve a dual purpose in configura- 
tion. In one sense they are a means of knowledge represen- 
tation that is easily understood. The levels convey an esti- 
mate of the importance attached to satisfying a constraint. 
However, the levels are also a control mechanism which 
can be used to govern the dynamics of the configuration 
process. For example, to ensure that high-speed devices 
are spread evenly on high-speed channels, the following 
constraints and levels are used: 

Maximum of one device on high-speed channels Level = 10 

Maximum of two devices on high-speed channels Level = 20 

Maximum of three devices on high-speed channels Level - 30 

Maximum of four devices on high-speed channels Level ■ 40 

Maximum of five devices on high-speed channels Levef - 50 



A fundamental constraint of the system is: 



Maximum of six devices on high-speed channels 



Level - 100 



The placement of a low-speed device on a high-speed 
channel at an early stage of the configuration would not 
strongly affect the performance of a successful configura- 
tion. However, it might stop the later placement of a high- 
speed device and prevent the configuration process from 
succeeding. Therefore, it is important that no low-speed 
devices be placed on high-speed channels until all high- 
speed devices have been attached. Since the constraints 
above may prevent this until the level is greater than 50, 
the following constraint is enforced: 



No low-speed devjees on high-speed channels 



Level = 60 



It is possible to have Mycon maintain its knowledge base 
by insisting that certain inequality relations between con- 
straint levels be enforced. For example: 

Level of (No low- speed devices on high-speed channels) > level of 
(Maxfrnum of five devices on high-speed channels). 

The constraints can be chosen to avoid having devices 
placed on the same channel, For example: 

Minimum of (Number of system discs attached to channel) AND (Number 
of magnetic tape drives attached to channel) = 0. Levef = 90 

(Number of high-speed master discs attached to channel) < = 1. 

Level = 30 

It is also possible to insist that certain devices not be placed 
with those of another type. For example: 



Number of slave discs attached to channel 
discs attached to channel 



OR Number of master 
Level - 100 



Observations 

It has been possible to translate all of the required and 
suggested constraints for device placement into the form 
of multilevel constraints easily. The selection of levels for 
the suggested constraints required some analysis and ex- 
perimentation. The easy-to-utiderstand form of the con- 
straints facilitates addition, deletion, modification, and 
testing by a configuration expert untrained in program- 
ming. 

The expansion of the prototype to handle placement into 
slots and junction panels greatly Increased the knowledge 
required by Mycon. This was accommodated by adding 
constraints to the device placement phase of the program. 
Using these additional constraints made possible a pro- 
cedural implementation of the slot and junction panel con- 
figuration. 

The use of constraints also permitted an easy transforma- 
tion of the program from Lisp to Pascal Using Lisp as a 
development language hastened development of the pro- 
totype by providing flexibility In choosing data structures 
and an excellent debugging environment. The conversion 
into Pascal was done for reasons of portability and main- 
tainability. 
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Conclusion 

The method of multilevel constraints has proven very 
effective for logical configuration of computer systems, It 
combines simplicity and power in knowledge representa- 
tion and control, The method can be extended to many 
problems in which a system is constructed systematically 
using knowledge of a domain. It can be used as a supple- 
ment or alternative to procedural or forward chaining 
methods. The constraints complement these methods by 
telling the program what actions not to take. 

Constraints seem to be a natural way for experts to ex- 
press their knowledge of a domain. By tracing the con- 
straints and the levels it is possible to monitoT very closely 
the progress of the configuration and detect gaps or errors 
in the knowledge* 

The real power of the algorithm has not been fully tested 
by the logical configuration problem. For example, express- 
ing multilevel constraints in mathematical form allows the 
introduction into expert systems of the tremendous amount 
of software constructed for linear and mathematical pro- 
gramming, In the field of artificial intelligence, there has 
been extensive work on constraint satisfaction and propa- 
gation; this could be used to extend the present algorithm. 
In the future, it seems possible that the method of multilevel 
constraints could become a standard tool of knowledge 
engineering, 
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